Running on EKS with Karpenter - Kosli Documentation

By default the reporter runs as a CronJob every 5 minutes. On clusters that use Karpenter for node autoscaling, this frequent scheduling can prevent nodes from being consolidated (scaled down). The cause is Karpenter’s consolidateAfter timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose consolidateAfter is longer than the reporter interval never becomes eligible for consolidation (see karpenter#1921). This is Karpenter working as designed, not a reporter bug. Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter’s way. Widening the interval trades away that detection speed and should be a last resort.

1. Pin the reporter to a stable node group (recommended)

If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use nodeSelector, and tolerations if that node group is tainted:

nodeSelector:
  eks.amazonaws.com/nodegroup: system   # your managed node group

tolerations:
  - key: dedicated
    operator: Equal
    value: system
    effect: NoSchedule

To steer the reporter away from Karpenter-managed nodes instead, use affinity (a plain nodeSelector cannot express “not on these nodes”):

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: karpenter.sh/nodepool
              operator: DoesNotExist

2. Run the reporter out of the cluster

For zero footprint on cluster nodes, run kosli snapshot k8s on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster’s nodes. See the Kubernetes environment reporting tutorial.

3. Widen the report interval (last resort)

Only if you cannot pin the reporter or move it out of cluster: set cronSchedule longer than your NodePool’s consolidateAfter so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above.

cronSchedule: "*/15 * * * *"

karpenter.sh/do-not-disrupt: "true" is not a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node less likely, not more. Likewise cluster-autoscaler.kubernetes.io/safe-to-evict only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter.

​1. Pin the reporter to a stable node group (recommended)

​2. Run the reporter out of the cluster

​3. Widen the report interval (last resort)

1. Pin the reporter to a stable node group (recommended)

2. Run the reporter out of the cluster

3. Widen the report interval (last resort)