Secret Scanning Guide: Detecting API Keys, Tokens, and Credentials Before They Cause Breaches
Secret scanning is the automated practice of detecting API keys, passwords, tokens, and other credentials that have been accidentally committed to source code repositories. It is one of the most high-value, low-effort security controls available — and it is still widely neglected.
A single leaked AWS access key can result in a six-figure cloud bill within hours. A leaked Stripe secret key exposes all customer payment data. A leaked database password opens your entire data tier to the internet. These are not theoretical scenarios — they happen daily to companies of every size.
This guide covers why secrets leak, how detection tools work, the major tools available (TruffleHog, Gitleaks, GitHub Secret Scanning), and how to build a layered defense.
Secure your deployed app too: Once your code is clean, run ZeriFlow to check that your web application is not exposing sensitive configuration through HTTP headers, paths, or misconfigured responses. It runs 80+ checks automatically.
Why Secrets End Up in Code: The Root Causes
Understanding why secrets leak helps you build effective prevention:
1. Local development convenience. Developers hardcode credentials to avoid setting up environment variables during rapid prototyping, then forget to remove them before committing.
2. Copy-paste accidents. A developer copies a config block from a working environment, including credentials, into a config file that gets committed.
3. Commit message leaks. Sometimes credentials appear in commit messages or comments, not just in code.
4. CI/CD configuration files. Pipeline configuration files checked into repositories sometimes contain secrets that 'should' come from environment variables.
5. Test files and fixtures. Test data files often contain real credentials used to create the test data, then committed as fixtures.
6. Git history. A secret committed and then deleted still lives in git history forever — and is easily retrievable with git log -p.
How Secret Scanning Tools Work
Secret scanners use two main detection techniques:
Pattern Matching (Regex)
Most scanners maintain libraries of regular expressions that match known secret formats:
AWS Access Key: AKIA[0-9A-Z]{16}
GitHub Token: ghp_[A-Za-z0-9]{36}
Stripe Secret Key: sk_(live|test)_[A-Za-z0-9]{24,}
Slack Token: xox[baprs]-([0-9a-zA-Z]{10,48})This approach catches common formats reliably but misses custom credentials that do not follow known patterns.
High Entropy Detection
Random strings (like API keys and cryptographic secrets) have high Shannon entropy — they are far more random than natural language. Entropy-based scanners flag high-entropy strings in strings that look like they could be credentials.
This catches custom credentials but also produces more false positives (random-looking test data, base64-encoded content, etc.).
The best tools combine both approaches.
Tool Deep-Dive: The Major Secret Scanners
TruffleHog
TruffleHog is one of the most comprehensive open-source secret scanners:
# Install
pip install truffleHog3
# or
brew install trufflehog
# Scan a git repository (including full history)
trufflehog git https://github.com/your-org/your-repo
# Scan local filesystem
trufflehog filesystem /path/to/project
# Scan a GitHub organization
trufflehog github --org=your-org
# Scan for secrets in Docker images
trufflehog docker --image your-image:latestTruffleHog v3 (the current version) uses detector-based architecture with 700+ detectors that know how to verify secrets — it can actually test whether a detected credential is still valid against the respective API. This dramatically reduces false positives.
Key features: - Git history scanning (finds deleted secrets) - 700+ credential detectors - Credential verification (live API checks) - JSON output for CI/CD integration - Supports GitHub, GitLab, S3, filesystem, Docker
Gitleaks
Gitleaks is a fast, lightweight secret scanner optimized for CI/CD use:
# Install
brew install gitleaks
# Scan current repository
gitleaks detect --source .
# Scan and generate SARIF report
gitleaks detect --source . --report-format sarif --report-path gitleaks.sarif
# Scan with verbose output
gitleaks detect -vGitleaks is preferred for CI/CD integration due to its speed and clean exit codes (exit 1 on findings, exit 0 on clean).
Pre-commit hook integration:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.0
hooks:
- id: gitleaksGitHub Secret Scanning
GitHub's built-in secret scanning is the lowest-friction option for GitHub repositories:
- Free for public repositories — GitHub automatically scans all public repos and notifies secret owners (AWS, Stripe, GitHub itself, 200+ providers).
- Available for GitHub Advanced Security — For private repositories, requires a license.
- Push protection — Blocks pushes containing known secret patterns before they reach the repository.
Enable via: Repository Settings → Security → Secret scanning → Enable.
GitHub Secret Scanning + Push Protection is a critical baseline that every GitHub organization should enable — it requires zero developer workflow changes and prevents the most common secret types from ever entering the repository.
detect-secrets (Yelp)
detect-secrets takes a different approach — it creates a baseline of all currently known 'acceptable' high-entropy strings and only flags new additions:
pip install detect-secrets
# Create baseline
detect-secrets scan > .secrets.baseline
# Audit baseline (review and mark false positives)
detect-secrets audit .secrets.baseline
# Use in pre-commit
detect-secrets-hook --baseline .secrets.baselineThis is particularly useful for legacy repositories with many existing false positives — you can baseline the current state and only alert on new additions.
Scanning Git History: Finding Secrets Already Committed
If you have not been scanning for secrets, secrets may already be in your git history. Scanning history is critical:
# TruffleHog — scan full git history
trufflehog git --since-commit HEAD~1000 https://github.com/your-org/repo
# Gitleaks — detect in full history
gitleaks detect --source . --log-opts="--all"If you find a secret in git history:
- 1Immediately revoke the credential — This is the most important step. Rotating a compromised key before an attacker uses it prevents the breach.
- 2Remove from history using git-filter-repo —
git filter-repo --path-glob '*.env' --invert-pathsrewrites history. - 3Force-push the rewritten history — Coordinate with your team; everyone must re-clone.
- 4Check audit logs — Examine the cloud provider or service logs to determine if the key was already used.
- 5Notify affected parties if necessary.
Note: Simply deleting the file and committing does NOT remove the secret from history. Git history is permanent unless you rewrite it.
Building a Defense-in-Depth Secret Detection Strategy
Layer your defenses:
| Layer | Tool | When |
|---|---|---|
| IDE plugin | GitGuardian IDE, Snyk IDE | As developer types |
| Pre-commit hook | Gitleaks | Before commit |
| CI/CD gate | TruffleHog or Gitleaks | On every PR |
| Repository scanning | GitHub Secret Scanning | Continuously |
| Periodic history scan | TruffleHog | Monthly |
FAQ
Q: What should I do if I accidentally committed an API key?
A: Revoke the key immediately — before doing anything else. Do not wait to rewrite history; the key is potentially compromised the moment it is pushed. Then rewrite git history to remove it, update all systems using the old key with the new one, and check audit logs for unauthorized use.
Q: Can secret scanners find custom credential formats?
A: Entropy-based scanners (TruffleHog, detect-secrets) can flag high-entropy strings that do not match known patterns. You can also add custom regex patterns to most tools. For truly proprietary credential formats, define a custom detector with a naming convention that makes the secret identifiable.
Q: Does scanning slow down developer workflows?
A: A fast pre-commit hook like Gitleaks typically runs in 1-2 seconds — nearly imperceptible. CI-based scanning runs in parallel with other checks. The performance overhead is negligible compared to the risk of a leaked credential.
Q: What is the difference between secret scanning and ZeriFlow?
A: Secret scanning finds credentials in source code and git history — it operates at the code layer. ZeriFlow scans deployed web applications for configuration security issues — headers, TLS, exposed paths. They protect different layers: ZeriFlow catches what is visible to the outside world; secret scanners catch what is hidden in your codebase.
Q: Should I use TruffleHog or Gitleaks?
A: Use both. Gitleaks is faster and better suited for blocking commits in pre-commit hooks. TruffleHog's 700+ detectors and credential verification make it superior for comprehensive scanning and history audits. They are complementary, not competing.
Conclusion: Stop Secrets Before They Reach the Repository
Secret scanning is one of the cheapest and most effective security controls available. The tools are free, integration takes less than an hour, and the protection is immediate.
Start today: enable GitHub Secret Scanning on all your repositories (it is one checkbox), then add a Gitleaks pre-commit hook to your most sensitive projects. Schedule monthly TruffleHog history scans to find anything already committed.
And do not forget the other layer: run ZeriFlow on your deployed applications to make sure secrets are not leaking through HTTP responses, headers, or exposed configuration endpoints.