Tracing a production incident back to git commits #
In this 5 minute tutorial you'll learn how Kosli can track a production incident in Cyber-dojo back to git commits.
Something has gone wrong and https://cyber-dojo.org is displaying a 500 error!
It was working an hour ago. What has happened in the last hour?
Getting ready #
You need to:
- Install Kosli CLI.
- Get a Kosli API token.
- Set the
KOSLI_ORG
environment variable tocyber-dojo
(the Koslicyber-dojo
organization is public so any authenticated user can read its data) andKOSLI_API_TOKEN
to your token:export KOSLI_ORG=cyber-dojo export KOSLI_API_TOKEN=<your-api-token>
Start with the environment #
https://cyber-dojo.org is running in an AWS environment
that reports to Kosli as aws-prod
.
Get a log of this environment's changes:
kosli log env aws-prod
At the time this tutorial was written the output of this command
displayed the first page of 177 snapshots.
You will see the first page of considerably more than 177 snapshots because
aws-prod
has moved on since this incident (it has been resolved with new
commits which have created new deployments).
To limit the output you can set the interval for the command:
kosli log env aws-prod --interval 176..177
The output should be:
SNAPSHOT EVENT FLOW DEPLOYMENTS
#177 Artifact: 274425519734.dkr.ecr.eu-central-1.amazonaws.com/creator:31dee35 creator #87
Fingerprint: 5d1c926530213dadd5c9fcbf59c8822da56e32a04b0f9c774d7cdde3cf6ba66d
Description: 1 instance stopped running (from 1 to 0).
Reported at: Tue, 06 Sep 2022 16:53:28 CEST
#176 Artifact: 274425519734.dkr.ecr.eu-central-1.amazonaws.com/creator:b7a5908 creator #89
Fingerprint: 860ad172ace5aee03e6a1e3492a88b3315ecac2a899d4f159f43ca7314290d5a
Description: 1 instance started running (from 0 to 1).
Reported at: Tue, 06 Sep 2022 16:52:28 CEST
These two snapshots belong to the same blue-green deployment.
You see artifact creator:b7a5908
starting in snapshot #176, and artifact
creator:31dee35
exiting in snapshot #177.
Dig into the artifact #
You are interested in #176, showing the newly running artifact, creator:b7a5908
,
with the fingerprint starting 860ad17
.
Let's learn more about this artifact:
kosli get artifact creator@860ad17
Name: cyberdojo/creator:b7a5908
Flow: creator
Fingerprint: 860ad172ace5aee03e6a1e3492a88b3315ecac2a899d4f159f43ca7314290d5a
Created on: Tue, 06 Sep 2022 16:48:07 CEST • 21 hours ago
Git commit: b7a590836cf140e17da3f01eadd5eca17d9efc65
Commit URL: https://github.com/cyber-dojo/creator/commit/b7a590836cf140e17da3f01eadd5eca17d9efc65
Build URL: https://github.com/cyber-dojo/creator/actions/runs/3001102984
State: COMPLIANT
History:
Artifact created Tue, 06 Sep 2022 16:48:07 CEST
Deployment #88 to aws-beta environment Tue, 06 Sep 2022 16:49:59 CEST
Deployment #89 to aws-prod environment Tue, 06 Sep 2022 16:51:12 CEST
Started running in aws-beta#196 environment Tue, 06 Sep 2022 16:51:42 CEST
Started running in aws-prod#176 environment Tue, 06 Sep 2022 16:52:28 CEST
Follow to the commit #
You can follow the commit URL.
The incident was caused by a simple typo in the app.rb
file!
Perhaps someone accidentally inserted the "s" while trying to save the file?
Either way, this is clearly the problem because the function is called respond_to
without the s
.
You were able to trace the problem back to a specific commit without any access to cyber-dojo's aws-prod
environment.