Do you want to know a secret?

Experiment reveals differences in secret leak detection on Github code repositories

UPDATED A new experiment by a Polish security researcher offers a fresh perspective on the well understood but still all too common problem of developers accidentally publishing secrets to code repositories.

Andrzej Dyjak recently ran an experiment to see how long it took before a secret committed to a public repository (such as API or cryptographic keys) was exploited.

An AWS key generated using the Thinkst Canary digital tripwire service was first compromised after 11 minutes when posted to GitHub, before getting repeatedly attacked thereafter.

A similar dummy secret went 62 minutes before suffering its first and only compromise on GitLab.

Dyjak used GitGuardian, a Git secrets scanning technology, during his experiment.

In the case of GitHub, Dyjak received an alert within seven minutes, and therefore before the first compromise, of possible secret leakage.

Time to pwnage

The time it takes for accidentally committed secrets on public repositories to being compromised has been the subject of previous, more extensive studies by North Carolina State University (PDF), IBM Research (PDF), and a GitHub honeypot exercise put together by security researcher Bob Diachenko.

Dyjak said that although people have focused on the honeypot part of his experiment, he was more interested in a comparison between the secret detection features of GitHub and GitLab.

“GitHub won – they caught the secret, alerted me, informed the provider, and on top of all that they also alerted me about vulnerable dependencies,” Dyjak told The Daily Swig.

RELATED Open source tool for bug hunters searches for leaked secrets in GitHub commits

“GitLab lost – it does offer similar SAST [static application security testing] features but they require setup of Auto DevOps which is a bottleneck.”

DevSecOps response

GitLab welcomed Dyjak's research, adding it was actively working to improve its secret detection capabilities.

Taylor McCaslin, senior product manager of static analysis at GitLab, told The Daily Swig: "While we can’t comment on specifics for security and confidentiality reasons, GitLab does take some steps to mitigate automated searches for customer secrets in public GitLab projects. This likely explains why publicly leaked customer secrets in public GitLab projects were identified and used fewer times in Dyjak’s research."

"GitLab has chosen to embed Secret Detection in our CI/CD [Continuous Integration / Continuous Delivery] as it provides customers complete control of the experience and workflow around secret detection alongside other CI/CD jobs. We are considering moving Secret Detection outside of the pipeline to further expand access to the feature and lower barriers to using it. GitLab has no current plans to provide pre-commit filters outside of our existing push rules due to performance concerns," he added.

GitLab introduced secret detection last year, initially as a feature of its upper tier plan, before making it available to all its customers (including those on the free tier) in the middle of this year.

The code repository is "in the process of building a post processing step into our Secret Detection capabilities to enable features like secret revocation through external vendor’s workflows", McCaslin added.

In response to a query on the research, GitHub said that scans every commit for potentially exposed secrets.

"GitHub scans every 'git push' to a public repository for potentially exposed secrets from 30+ cloud service providers who have partnered with GitHub to keep their users safe," a GitHub spokesperson explains. "If we find any, we notify the provider and they take action.

"If Andrzej’s tokens had been real AWS or Slack credentials (rather than Thinkst Canary dummies), they would have been automatically revoked within seconds.

"At GitHub’s scale, scanning every 'git push' to a public repository translates to thousands of credential leaks stopped every day. We find and revoke over 100 tokens for GitHub’s own API every day alone," they added.

A tale of two secrets

Dyjak deliberately leaked two secret API keys, attached to AWS and Slack, respectively.

“While AWS was compromised in both cases (GH and GL),” Dyjak reports, “Slack was not compromised in neither.”

This finding provides anecdotal evidence that adversaries are actively and selectively hunting for leaked AWS secrets rather than any inadvertently exposed tokens or credentials within code repositories.

This story was updated to add comment from both GitHub and GitLab.

RECOMMENDED xGitGuard uses AI to detect inadvertently exposed data on GitHub