Tracing a production incident back to git commits

Tracing a production incident back to git commits #

In this 5 minute tutorial you'll learn how Kosli can track a production incident in Cyber-dojo back to git commits.

Something has gone wrong and https://cyber-dojo.org is displaying a 500 error!

Prod cyber-dojo is down with a 500

It was working an hour ago. What has happened in the last hour?

Start with the environment #

https://cyber-dojo.org is running in an AWS environment that reports to Kosli as aws-prod.
Get a log of this environment's changes:

kosli env log aws-prod --long

You will see more than 177 snapshots because aws-prod has moved on since this incident (it has been resolved with new commits which have created new deployments). To get the same output as we have you can set the interval for the command:

kosli env log aws-prod --long 175..177
SNAPSHOT  EVENT                                                                          PIPELINE  DEPLOYMENTS
#177      Artifact: 274425519734.dkr.ecr.eu-central-1.amazonaws.com/creator:31dee35      creator   #87 
          Fingerprint: 5d1c926530213dadd5c9fcbf59c8822da56e32a04b0f9c774d7cdde3cf6ba66d             
          Description: 1 instance stopped running (from 1 to 0).                               
          Reported at: Tue, 06 Sep 2022 16:53:28 CEST                                          
                                                                                               
#176      Artifact: 274425519734.dkr.ecr.eu-central-1.amazonaws.com/creator:b7a5908      creator   #89 
          Fingerprint: 860ad172ace5aee03e6a1e3492a88b3315ecac2a899d4f159f43ca7314290d5a             
          Description: 1 instance started running (from 0 to 1).                               
          Reported at: Tue, 06 Sep 2022 16:52:28 CEST
...

These two snapshots belong to the same blue-green deployment. You see artifact creator:b7a5908 starting in snapshot #176, and artifact creator:31dee35 exiting in snapshot #177.

Dig into the artifact #

You are interested in #176, showing the newly running artifact, creator:b7a5908, with the fingerprint starting 860ad17.

Let's learn more about this artifact:

kosli artifact get creator@860ad17
Name:        cyberdojo/creator:b7a5908
Fingerprint: 860ad172ace5aee03e6a1e3492a88b3315ecac2a899d4f159f43ca7314290d5a
Created on:  Tue, 06 Sep 2022 16:48:07 CEST • 21 hours ago
Git commit:  b7a590836cf140e17da3f01eadd5eca17d9efc65
Commit URL:  https://github.com/cyber-dojo/creator/commit/b7a590836cf140e17da3f01eadd5eca17d9efc65
Build URL:   https://github.com/cyber-dojo/creator/actions/runs/3001102984
State:       COMPLIANT
History:  
    Artifact created                               Tue, 06 Sep 2022 16:48:07 CEST
    Deployment #88 to aws-beta environment         Tue, 06 Sep 2022 16:49:59 CEST
    Deployment #89 to aws-prod environment         Tue, 06 Sep 2022 16:51:12 CEST
    Started running in aws-beta#196 environment    Tue, 06 Sep 2022 16:51:42 CEST
    Started running in aws-prod#176 environment    Tue, 06 Sep 2022 16:52:28 CEST

Follow to the commit #

You can follow the commit URL .

cyber-dojo github diff

The incident was caused by a simple typo in the app.rb file!

Perhaps someone accidentally inserted the "s" while trying to save the file? Either way, this is clearly the problem because the function is called respond_to without the s.

You were able to trace the problem back to a specific commit without any access to cyber-dojo's aws-prod environment.

See also the other tutorials: #