Vanta Logo
SPONSOR
Automate SOC 2 & ISO 27001 compliance with Vanta. Get $1,000 off.
Up to date
Published
9 min read

Trevor I. Lasn

Staff Software Engineer, Engineering Manager

Secure Your Repositories: Prevent Credential Leaks with Gitleaks

Automate security flows and ensure your team follows security best practices

Leaking secrets is like leaving your front door wide open. Most won’t notice, but it only takes one bad actor to walk in and cause havoc.

Even a tiny leak can lead to a massive data breach. Consider these real-world examples:

Uber breach (2016)

One of the most catastrophic data breaches due to leaked secrets was the Uber breach in 2016. Hackers exposed the personal data of 57 million customers and drivers, including names, email addresses, and phone numbers.

The breach happened because attackers accessed Uber’s GitHub repository, where sensitive information, including AWS credentials, was stored. With these credentials, they gained access to Uber’s cloud servers and found a data archive containing personal information.

Equifax breach (2017)

The Equifax data breach exposed the personal information of 147 million people, including Social Security numbers, birth dates, and addresses. The breach occurred due to a vulnerability in a web application that Equifax failed to patch.

Additionally, sensitive data was stored in plaintext, and administrative credentials were compromised, allowing attackers to access and steal data over several months.

Capital One breach (2019)

In 2019, Capital One experienced a breach that exposed the personal information of over 100 million customers. A former AWS employee exploited a misconfigured firewall in Capital One’s cloud infrastructure, accessing to sensitive data stored on AWS S3.

The breach included Social Security numbers, bank account numbers, and credit scores. The incident highlighted the importance of securing cloud infrastructure and properly configuring access controls.

Adobe breach (2013)

Adobe experienced a massive data breach in 2013, affecting 38 million users. Hackers accessed Adobe’s servers and stole source code for several Adobe products, along with user information, including encrypted passwords and payment card details.

The breach occurred due to weak password storage practices and poor security controls. It demonstrated the risks of storing sensitive information without adequate encryption and access controls.

LinkedIn breach (2012)

In 2012, LinkedIn suffered a data breach that exposed the passwords of approximately 6.5 million users. The passwords were stored using a weak hashing algorithm without proper salt, making them easy to crack.

How do secrets leak?

  1. Hardcoded Values: Putting sensitive data directly in your code is risky. It happens more often than you’d think.
  2. Version Control: Even if you remove a secret, it might still exist in your Git history.
  3. Misconfigured Access: Accidentally exposing a private repository or granting too much access can spill your secrets.

Preventive measures

  1. Use Environment Variables: Keep sensitive information out of your codebase by storing it in environment variables.
  2. Regularly Scan Your Codebase: Regular scans can help catch potential leaks before they become a problem.
  3. Audit Your Git History: Regularly audit and clean up your Git history. Tools like git filter-repo can help remove sensitive data from past commits.
  4. Secure Access: Limit who can access your repositories and use two-factor authentication to add an extra layer of security

Gitleaks

Gitleaks is an open-source, free tool that I have no affiliation with. It’s freely available and easy to use, making it a great choice for regular security checks.

Terminal window
# MacOS
brew install gitleaks
# Docker (DockerHub)
docker pull zricethezav/gitleaks:latest
docker run -v ${path_to_host_folder_to_scan}:/path zricethezav/gitleaks:latest [COMMAND] --source="/path" [OPTIONS]
# Docker (ghcr.io)
docker pull ghcr.io/gitleaks/gitleaks:latest
docker run -v ${path_to_host_folder_to_scan}:/path ghcr.io/gitleaks/gitleaks:latest [COMMAND] --source="/path" [OPTIONS]
# From Source (make sure `go` is installed)
git clone https://github.com/gitleaks/gitleaks.git
cd gitleaks
make build

Scanning Gitleaks repository with Gitleaks

To demonstrate Gitleaks’ effectiveness, I ran a scan on its own repository.

Terminal window
git clone [email protected]:gitleaks/gitleaks.git
Cloning into 'gitleaks'...
remote: Enumerating objects: 8623, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 8623 (delta 11), reused 17 (delta 5), pack-reused 8595
Receiving objects: 100% (8623/8623), 5.32 MiB | 5.44 MiB/s, done.
Resolving deltas: 100% (4688/4688), done.
gitleaks detect --source ./gitleaks
│╲
gitleaks
4:55PM INF 855 commits scanned.
4:55PM INF scan completed in 435ms
4:55PM WRN leaks found: 38

Gitleaks currently shows 38 leaks. You might be asking, how can a tool designed to find leaks have leaks itself? These are likely false positives, and the rules may need further tweaking to filter them out.

Terminal window
gitleaks detect --source ./gitleaks -v
...
Finding: ...e-cli login --token ******************************`
Secret: ******************************
RuleID: huggingface-access-token
Entropy: 4.553652
File: cmd/generate/config/rules/huggingface.go
Line: 31
Commit: 9fb36b242d75aac1a2bf885724dfd9886db08ea7
Author: *********
Email: *********@users.noreply.github.com
Date: 2023-08-24T15:38:19Z
Fingerprint: 9fb36b242d75aac1a2bf885724dfd9886db08ea7:cmd/generate/config/rules/huggingface.go:huggingface-access-token:31
4:57PM INF 855 commits scanned.
4:57PM INF scan completed in 435ms
4:57PM WRN leaks found: 38

You can also use the -v (verbose) flag and get a full report.

Note: I replaced the actual secrets and personal details with asterisks ( * ) for privacy and security reasons.

What Gitleaks detected

The scan revealed multiple leaks, including API keys and tokens. For each finding, Gitleaks provided specific details:

  1. File: The file containing the sensitive information.
  2. Line Number: The exact line where the secret was found.
  3. Commit Hash: The unique identifier for the commit that introduced the secret.
  4. Author: The author of the commit.

These details are invaluable for understanding the context of the leak and taking corrective action. They help pinpoint when the leak occurred and who might be aware of it.

Even well-maintained projects can have secrets accidentally committed. This demonstrates the importance of regularly scanning your repositories, regardless of the project’s size or reputation.

rust-lang/rust

Terminal window
labs git clone [email protected]:rust-lang/rust.git
Cloning into 'rust'...
remote: Enumerating objects: 2672711, done.
remote: Counting objects: 100% (1168/1168), done.
remote: Compressing objects: 100% (661/661), done.
remote: Total 2672711 (delta 712), reused 776 (delta 497), pack-reused 2671543
Receiving objects: 100% (2672711/2672711), 1.26 GiB | 5.28 MiB/s, done.
Resolving deltas: 100% (2061551/2061551), done.
Updating files: 100% (48974/48974), done.
labs gitleaks detect --source rust
│╲
gitleaks
5:43PM INF 185180 commits scanned.
5:43PM INF scan completed in 4m3s
5:43PM WRN leaks found: 2437
labs

You read that right. Rust has 2,437 leaks, lol. Let’s be real here; the severity and legitimacy of some of them might vary. For example, this is reported as a security issue.

Terminal window
Finding: ...ESS_KEY_ID: ${{ env.CACHES_AWS_ACCESS_KEY_ID }}
Secret: CACHES_AWS_ACCESS_KEY_ID
RuleID:
Entropy: 3.188722
File: .github/workflows/ci.yml
Line: 196
Commit: 1ca92c085788f68ff9b23cf597da5c62924e3f37
Author: ************
Email: ************
Date: 2024-04-29T19:32:35Z
Fingerprint: 1ca92c085788f68ff9b23cf597da5c62924e3f37:.github/workflows/ci.yml::196

We can’t see the actual value of CACHES_AWS_ACCESS_KEY_ID here. Even if we did, it wouldn’t pose a threat since cache access key IDs alone aren’t valuable to malicious users.

This shows why it’s crucial to double-check the results from any tool. Not every reported leak is a real security issue. For example:

  • over 2000 “leaks” were DNA sequences formed entirely of the letters ACGT. The DNA sequences are in files like this one.
  • 14 “leaks” were non-sensitive IDs from the AWS token pair, found in .travis.yaml
  • 2 “leaks” were similar non-sensitive IDs in jobs.yml

It’s crucial to customize tools like Gitleaks to reduce false positives and focus on genuine threats. When projects are this complex, reporting large numbers of potential leaks without proper analysis can lead to unnecessary alarm and confusion.

Proper use of Gitleaks: A case study with the Rust repository

To effectively use Gitleaks and minimize false positives, it’s essential to tailor the tool to your project’s specific context.

Customizing gitleaks configuration

  1. Create a .gitleaks.toml configuration file
  2. Define rules and exceptions to refine the scan.

For example, if certain patterns (like DNA sequences or specific IDs) are not sensitive, they can be excluded.

[[rules]]
description = "Exclude DNA sequences"
regex = '''^[ACGT]{10,}$'''
tags = ["dna", "non-sensitive"]
[[rules]]
description = "Exclude AWS Access Key IDs"
regex = '''CACHES_AWS_ACCESS_KEY_ID'''
tags = ["aws", "non-sensitive"]
  1. Run Gitleaks with the custom config: Use the —config flag to specify the custom configuration file.
Terminal window
gitleaks detect --source . --config .gitleaks.toml -v --report-path gitleaks-report.json
  • —source ./rust: Specifies the directory of the Rust repository.
  • —config .gitleaks.toml: Points to the custom configuration file.
  • -v: Verbose mode for detailed output.
  • —report-path gitleaks-report.json: Specifies the output file for the report.

Analyzing the Results

After running Gitleaks with a tailored configuration:

Terminal window
rust git:(master) gitleaks detect --source . --config .gitleaks.toml
│╲
gitleaks
9:29PM INF 185180 commits scanned.
9:29PM INF scan completed in 3m59s
9:29PM WRN leaks found: 32
rust git:(master)

The customized rules help eliminate common false positives, like non-sensitive IDs or known patterns. Feel free to tweak the rules and experimenting with the scanner too.

Preventing leaks with Gitleaks pre-commit hook

A pre-commit hook lets you run scripts before code is committed. Here’s a quick setup for Gitleaks as a pre-commit hook:

  1. Install Gitleaks
Terminal window
brew install gitleaks
  1. Create and Edit Hook Script:
Terminal window
touch your-project/.git/hooks/pre-commit
#!/bin/sh
gitleaks detect --source . -v --report-path gitleaks-report.json
if [ $? -ne 0 ]; then
echo "Gitleaks detected leaks. Check gitleaks-report.json."
exit 1
fi
  • —source . Specifies the current directory as the source to scan. The . denotes the root of the current Git repository.

  • -v: Stands for verbose mode, which provides more detailed output during the scan process. It helps in understanding what the tool is doing and any findings it may encounter.

  • —report-path gitleaks-report.json: Specifies the path and filename where the scan results will be saved in JSON format. In this case, the report will be saved as gitleaks-report.json in the current directory.

  1. Make Script Executable:
Terminal window
chmod +x .git/hooks/pre-commit

This setup scans the git changes for secrets before committing. If leaks are found, the commit is stopped, ensuring sensitive data stays out of your repository.

Terminal window
labs git:(master) git add .
labs git:(master) git commit -m "lezz goo"
│╲
gitleaks
10:31PM INF 2 commits scanned.
10:31PM INF scan completed in 4.02s
10:31PM INF no leaks found
[master 9c72b1ae] lezz goo
1 file changed, 1 insertion(+), 1 deletion(-)
labs git:(master) cat gitleaks-report.json
[]

How to evaluate tools for finding leaks

  1. Open-Source Nature: The tool should have its source code available. This lets the community check for unauthorized data transmissions.

  2. Network Monitoring: Monitor network traffic during scans. Ensure the tool doesn’t communicate with external servers.

Your code is only as safe as your secrets. Don’t let a tiny leak sink your ship.

If you found this article helpful, you might enjoy my free newsletter. I share developer tips and insights to help you grow your skills and career.


More Articles You Might Enjoy

If you enjoyed this article, you might find these related pieces interesting as well. If you like what I have to say, please check out the sponsors who are supporting me. Much appreciated!

Tech
5 min read

Cloudflare's AI Content Control: Savior or Threat to the Open Web?

How Cloudflare's new AI management tools could revolutionize content creation, potentially reshaping the internet landscape for both website owners and AI companies.

Sep 24, 2024
Read article
Tech
4 min read

Why I moved from Google Analytics to Simple Analytics

I ditched Google Analytics for a privacy-focused analytics tool that bypasses ad blockers

Nov 9, 2024
Read article
Tech
5 min read

Is Age Really a Factor in Tech?

Silicon Valley has a reputation for youth worship. The 'move fast and break things' mentality often translates to a preference for younger, supposedly more adaptable workers.

Oct 8, 2024
Read article
Tech
3 min read

Why Anthropic (Claude AI) Uses 'Member of Technical Staff' for All Engineers (Including Co-founders)

Inside Anthropic's unique approach to preventing talent poaching and maintaining organizational equality

Oct 23, 2024
Read article
Tech
3 min read

The Credit Vacuum

Being a developer sometimes feels like being the goalkeeper in a soccer match. You make a hundred great saves, and no one bats an eye. But let one ball slip through, and suddenly you're the village idiot.

Oct 7, 2024
Read article
Tech
5 min read

Cloudflare Study: 39% of Companies Losing Control of Their IT and Security Environment

New research reveals a shocking loss of control in corporate IT environments

Oct 3, 2024
Read article
Tech
3 min read

Google is Killing Information Economics on the Internet

Google’s Gemini pulls summaries from websites and slaps them directly into the search results

Sep 11, 2024
Read article
Tech
5 min read

Repopack (now Repomix): Pack Your Entire Repository Into A Single File

A tool that packages your code to easily share with LLM models.

Oct 21, 2024
Read article
Tech
5 min read

Pkl: Apple's New Configuration Language That Could Replace JSON and YAML

A deep dive into Pkl, Apple's configuration language that aims to replace JSON and YAML

Nov 1, 2024
Read article

This article was originally published on https://www.trevorlasn.com/blog/your-repo-is-a-leaky-ship-probably. It was written by a human and polished using grammar tools for clarity.