How to Remove Sensitive Data from Git History: 2 Tools Explained

I'm a curious Geek with an insatiable thirst to learn new technologies and enjoy the process every day. I aim to deliver high-quality services with the highest standards and cutting-edge DevOps technologies to make people's lives easier.
Scenario
Accidentally committing sensitive information, such as API keys, passwords, or personal data like phone numbers, to a Git-based version control system can happen to anyone.
Now, imagine a situation where you’ve pushed an API key that cannot be regenerated, a password that cannot be reset, or—worse—a personal phone number that’s now publicly accessible. No one wants to deal with the hassle of purchasing a new SIM card or facing potential security risks simply because of an oversight.
Fortunately, there are effective ways to address this issue and prevent sensitive information from being permanently exposed.
Important Note Before Proceeding:
⚠️ The following solutions involve rewriting commit history, which will modify all commit hashes. If your workflow depends on commit hashes, consider alternative approaches.
⚠️ For teams, rewriting history can impact uncommitted changes made by your teammates. Ensure you coordinate with your team before proceeding.
Solution 1: BFG Repo-Cleaner
The BFG Repo-Cleaner is a powerful Java-based tool that simplifies the process of removing sensitive information from your Git repository’s history. Below are two methods for setting it up:
The Automated Method
To streamline the setup, use the following script, which has been tested on Ubuntu 24.04:
# The following script has been written and tested on ubuntu 24.04:
curl -s https://raw.githubusercontent.com/shahinam2/bfg-repo-cleaner-auto-install/main/auto-install.sh -o auto-install.sh && chmod +x auto-install.sh && ./auto-install.sh
You can find the repository for this script at: GitHub Repo.
The Manual Method
Download the BFG Repo-Cleaner:
- Visit the official website: BFG Repo-Cleaner.
Install Java:
- Download and install Java (version 8 or later).
This method provides more control over the installation process and is ideal if you prefer a manual setup.
How to use BFG Repo-Cleaner
⚠️ Important: Before proceeding, ensure you have backed up your repository by cloning it into a separate directory. This will help prevent accidental data loss during the cleanup process.
Suppose your repository contains a file named credentials that holds sensitive information, and this file was committed in commit number 2.


Remove the File
First, delete the file containing sensitive information from your local directory:
rm credentials
Stage and Commit the Deletion
Next, stage the change and create a new commit locally to remove the file:
git add .
git commit -m "remove the credentials file"
This ensures that the sensitive file is no longer part of your working directory or future commits.
Using BFG Repo-Cleaner
Assuming you are already in the directory where BFG Repo-Cleaner is located, use the following command to rewrite the repository history and remove the sensitive file:
java -jar bfg-1.14.0.jar /path/to/your/repo/ --delete-files file-to-remove
Example:
If the file to be removed is named credentials, the command would look like this:
java -jar bfg-1.14.0.jar /home/shahin/bfg-tool-test/ --delete-files credentials
Output:
Using repo : /home/shahin/bfg-tool-test/.git
Found 2 objects to protect
Found 3 commit-pointing refs : HEAD, refs/heads/main, refs/remotes/origin/main
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 7b439b14 (protected by 'HEAD')
Cleaning
--------
Found 4 commits
Cleaning commits: 100% (4/4)
Cleaning commits completed in 30 ms.
Updating 2 Refs
---------------
Ref Before After
----------------------------------------------
refs/heads/main | 7b439b14 | 1597a9c1
refs/remotes/origin/main | d611e7c6 | e063add1
Updating references: 100% (2/2)
...Ref update completed in 38 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
. D D m
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | ab6b164d | a11e11a6
Last dirty commit | d611e7c6 | e063add1
Deleted files
-------------
Filename Git id
------------------------------
credentials | df20d103 (52 B )
In total, 5 object ids were changed. Full details are logged here:
/home/shahin/bfg-tool-test.bfg-report/2025-01-11/23-01-56
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive
What the Output Means:
Protected Commits:
BFG preserved the commit
1bd6a8cbbecause it was "protected by 'HEAD'." Protected commits are typically the latest commits in your repository to ensure no accidental data corruption.If the
credentialsfile exists in this protected commit, you need to clean it manually before running BFG again.
Cleaning Results:
BFG identified and cleaned 4 commits in your repository that had the
credentialsfile in their history.It updated 2 references (
refs/heads/mainandrefs/remotes/origin/main) to point to the rewritten history.
Commit Tree-Dirt History:
- Indicates which commits were modified or cleaned.
D(dirty commits) were fixed by BFG.
Deleted Files:
- BFG successfully found and flagged the
credentialsfile (df20d103is its Git ID). This indicates the file was removed from the Git history for the rewritten commits.
Steps to Finalize and Verify
Run Garbage Collection:
After running BFG Repo-Cleaner, the process is not fully complete. To finalize the cleanup, you must remove any deleted objects from the Git repository by executing the garbage collection command provided in the output.
Run the following command to clean up your repository:
# Ensure you are in the root directory of your repository before executing this command
git reflog expire --expire=now --all && git gc --prune=now --aggressive
Expected output:
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (7/7), 612 bytes | 612.00 KiB/s, done.
Total 7 (delta 2), reused 7 (delta 2), pack-reused 0
remote: Resolving deltas: 100% (2/2), done.
To github.com:shahinam2/bfg-tool-test.git
+ d611e7c...1597a9c main -> main (forced update)
This will permanently delete the orphaned objects (e.g., the credentials file) from the repository.
Verify Removal of Sensitive File:
Ensure that the sensitive file, such as credentials, has been completely removed from your Git history by running the following commands:
git log --all --grep="credentials"
git grep "credentials"
If both commands return no results, it means the file has been successfully removed after running BFG and performing garbage collection.
Force Push to Remote: If your repository is hosted on a remote platform (e.g., GitHub), you’ll need to push the rewritten history using a force push:
git push origin --force
⚠️ Important: Force-pushing will overwrite the repository history on the remote server. Be sure to notify all collaborators, as they will need to re-clone the repository to avoid conflicts.
Before Cleanup:
The commit history shows
commit 2, which contains the sensitivecredentialsfile.The file's content is visible within the repository.


After Cleanup:
commit 2has been rewritten to remove the sensitive content using BFG Repo-Cleaner.The sensitive
credentialsfile is no longer visible in the repository history, as confirmed by the updated commit structure.


Solution 2: Git-Filter-Repo
git-filter-repo offers several advantages over BFG Repo-Cleaner, making it a more versatile and efficient choice for rewriting Git history:
Flexibility:
Provides comprehensive features for history rewriting, unlike BFG, which focuses on specific tasks like removing sensitive data.
No Java Dependency:
Python-based and lighter on resources, whereas BFG requires Java Runtime Environment.
No Protected Commits:
Can modify all commits, including the latest, ensuring complete data removal. BFG protects the latest commit by default, requiring manual cleanup.
Speed:
Optimized for performance and generally faster for large repositories compared to BFG.
Active Maintenance:
Regularly updated with detailed documentation, ensuring compatibility with modern Git features.
Customizable Outputs:
Generates detailed logs and mappings for auditing, unlike BFG’s simpler reporting.
Core Git Integration:
Relies on core Git functionality, making it portable and easier to integrate into workflows.
Broad Use Cases:
Versatile enough for various history-rewriting tasks, not limited to removing sensitive data.
How to Use Git-Filter-Repo
Install git-filter-repo:
If not already installed, use the following command:
pip install git-filter-repo
Backup Your Repository:
Before making changes, back up your repository to prevent data loss:
cp -r /path/to/your/repo /path/to/your/repo-backup
# Example:
cp -r /home/shahin/bfg-tool-test /home/shahin/bfg-tool-test-backup
Remove the File:
Run the following command to remove all instances of the credentials file from your Git history:
# Assuming that the credential file is in the current folder, otherwise provide the full path
git filter-repo --sensitive-data-removal --invert-paths --path credentials
-path credentials: Targets the credentials file.
-invert-paths: Removes the targeted file from the repository's history.
This ensures the sensitive file is completely removed from the repository while keeping other data intact.
Force-Push the Updated Repository:
If your repository is hosted on a remote platform (e.g., GitHub), you need to push the rewritten history with a force push:
git push origin --force
Verify Removal:
Run these commands to confirm the credentials file is no longer in the repository history:
git log --all --grep="credentials"
git grep "credentials"
No results indicate successful removal.
Before Cleanup: Commit history shows commit 2, which contains the sensitive credentials file, with its content visible in the repository.


After Cleanup:
After running the cleanup steps, commit 2 has been successfully removed from the history, ensuring the sensitive content is no longer accessible.

Git Guardian: Prevention Over Cure
Avoid the hassle of removing sensitive data from your repository by preventing it from being pushed in the first place. Tools like Git Guardian can monitor and block sensitive information from being added to Git-based platforms.
Visit Git Guardian: https://www.gitguardian.com/
Sources
BFG Repo-Cleaner
Git-Filter-Repo
Photo by Katarina Humajova on Unsplash.

![Up and Running with kubectl-ai [PDF]](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1753898584930%2F01739f18-1331-4d48-b709-fc2750685607.png&w=3840&q=75)
![How to deal with DNS caches [PDF]](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1753101148740%2F9721c8d4-86d5-4ec8-b4f7-c317f7ccfe56.png&w=3840&q=75)
![Are Kubernetes Secrets Really Secure? [PDF]](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1752308392314%2F25995822-24ef-4fab-afa4-f88a806d9e89.png&w=3840&q=75)

