- Unexpected statefile changes — someone runs
terraform applyoutside your pipeline, so the statefile and the world still agree and a plan comes back empty. See Detecting unexpected statefile changes. - Non-Terraform changes — someone edits the world directly via the cloud console, API, or CLI: a hotfix in the console, a partial apply failure, an out-of-band automation. Reality no longer matches the statefile, so a
terraform plancatches it. This page covers detecting this type.
How the detection works
The detector is a scheduledterraform plan against the last-applied git SHA, with the result recorded in a small marker file that Kosli watches for tampering:
-
At apply time, the pipeline writes a fresh marker —
drift.plan.json, stored next to the statefile — recording the applied SHA withdrift: false, and attests it into your Kosli Environment: -
On a schedule, the detector reads the marker, checks out the recorded SHA, and runs a read-only plan. The cleanest machine-readable signal is the plan exit code:
-lock=falsemeans the read-only drift plan never contends with a real apply;-input=falsemeans it can never hang waiting for a prompt. -
When drift is found, the detector overwrites the marker in S3 with
{sha, drift: <timestamp>}— fresh, un-attested content. On its next snapshot, the Kosli reporter Lambda sees a marker that no longer matches its attestation, and the Environment reports itself as non-compliant.
Plan against the applied SHA, not against main
This is the single most common false-positive source. If changes are merged to main but not yet applied — because the apply is gated behind a manual approval, or batched into a release — then planning against main shows a non-empty plan that reflects pending intentional changes, not drift. The marker exists precisely to record the applied SHA, and the detector always checks out that commit before planning.
Latch, don’t spam
Once drift is flagged, you usually don’t want to re-plan and re-alert every cycle until someone acts. The marker doubles as a latch: the detector only plans whiledrift is false, and the next successful apply writes a fresh {sha, drift: false} marker to reset it.
Prerequisites
- Terraform is applied through CI/CD, not from laptops, as the normal path — with remote, locked state (for example, an S3 backend with the native S3 lockfile or DynamoDB).
- Keyless CI authentication to your cloud (for example, GitHub OIDC) with a dedicated, read-capable role for the detector. The detector never needs apply permissions.
- A Kosli account and API token.
- A Kosli Environment for each Terraform environment you want to protect.
- The Kosli reporter Lambda deployed to snapshot the drift marker (and statefile) into that Environment on a schedule.
Setting it up with kosli-dev/tf
Everything above is implemented at github.com/kosli-dev/tf: a thin Terraform wrapper (tf) and a set of reusable GitHub Actions workflows, both open source under the MIT license. Two of the workflows carry this control:
apply.yml— the plan steps plustf apply, then a reset-drift-detection job that writes a fresh{sha, drift: false}marker to S3 (the known-good baseline for the next drift run) and attests it, along with the plan, apply log, and statefile, into your Kosli Environment. See Detecting unexpected statefile changes for the caller workflow and flow template — the same apply setup covers both drift types.detect-drift.yml— the detector. Reads the baseline marker, and only ifdrift == falseruns a plan against the baseline SHA. A non-empty plan overwrites the marker with{sha, drift: <timestamp>}; otherwise it records a no-drift summary.
Hardening
A detector that runs once and alerts once is easy. A detector you can depend on for an audit needs to handle the failure modes below.Monitor the monitor
Monitor the monitor
This is the most dangerous failure mode. If the scheduled job silently stops running, no new evidence arrives to contradict the last result — so the environment looks green forever, even as drift accumulates. Treating “the dashboard is green” as proof of cleanliness, without also verifying the underlying job is running on schedule, is a misuse of the control. Add a heartbeat or alert on “job has not run in N intervals” for both the detector workflow and the reporter Lambda.
Terraform-managed resources only
Terraform-managed resources only
terraform plan can only see resources Terraform manages. A resource created entirely outside Terraform — say, an IAM user added by hand in the console with no corresponding Terraform resource — is invisible to this control. Closing that gap is the job of an Infrastructure-as-Code coverage policy (everything in production must be defined as code in the first place); drift detection assumes that policy holds and does not substitute for it.Tune cadence per environment
Tune cadence per environment
Worst-case detection latency is the check interval plus the reporter Lambda’s snapshot interval. A ten-minute check with a five-minute reporter Lambda surfaces drift within fifteen minutes. Set the schedule from each environment’s rate-of-change and blast radius rather than using one global value.
Concurrency and least privilege
Concurrency and least privilege
Guard against overlapping runs for the same environment with a concurrency group. Scope the detector’s cloud role tightly: it needs to read state and plan, plus write the marker file — nothing more. It must never hold apply permissions.
Implementation checklist
- Terraform is applied through CI/CD, with remote, locked state.
- Each apply writes a fresh
{sha, drift: false}marker and attests it into a Kosli Environment. - A scheduled job plans against the applied SHA — not against
main— using a read-only, lock-free plan. - A non-empty plan overwrites the marker; the result latches until the next apply resets it.
- The Kosli reporter Lambda snapshots the marker from S3 into the Environment on a schedule.
- Both the detector workflow and the reporter Lambda are monitored for silent failure.
- The detector’s cloud role can read and plan only — never apply.
- Cadence and concurrency are tuned per environment.
Related
- Drift Detection (SDLC-CTRL-0018) — the control both drift-detection tutorials implement.
- Detecting unexpected statefile changes — the other drift type: out-of-CI applies a plan can never catch.
kosli-dev/tf— the reference wrapper and reusable workflows.- Environments — the Kosli primitive that carries the compliance signal.
- Terraform
-detailed-exitcodeand remote/locked S3 backends.