Skip to main content

Command Palette

Search for a command to run...

How to Remove Sensitive Data from Git History: 2 Tools Explained

Updated
8 min read
How to Remove Sensitive Data from Git History: 2 Tools Explained
S

I'm a curious Geek with an insatiable thirst to learn new technologies and enjoy the process every day. I aim to deliver high-quality services with the highest standards and cutting-edge DevOps technologies to make people's lives easier.

Scenario

Accidentally committing sensitive information, such as API keys, passwords, or personal data like phone numbers, to a Git-based version control system can happen to anyone.

Now, imagine a situation where you’ve pushed an API key that cannot be regenerated, a password that cannot be reset, or—worse—a personal phone number that’s now publicly accessible. No one wants to deal with the hassle of purchasing a new SIM card or facing potential security risks simply because of an oversight.

💡
Disclaimer: If you have accidentally exposed sensitive information, such as an API key, password, or access token, the first step should always be to revoke and regenerate the exposed credential immediately. This ensures that unauthorized access is prevented. This article focuses primarily on addressing cases where the sensitive data cannot be changed (e.g., personal identifiers, non-regenerable keys, phone numbers, etc…) and must be removed from a public repository as quickly as possible to mitigate potential risks. Following these steps will help ensure your repository does not retain publicly accessible sensitive data.

Fortunately, there are effective ways to address this issue and prevent sensitive information from being permanently exposed.

Important Note Before Proceeding:

⚠️ The following solutions involve rewriting commit history, which will modify all commit hashes. If your workflow depends on commit hashes, consider alternative approaches.

⚠️ For teams, rewriting history can impact uncommitted changes made by your teammates. Ensure you coordinate with your team before proceeding.

Solution 1: BFG Repo-Cleaner

The BFG Repo-Cleaner is a powerful Java-based tool that simplifies the process of removing sensitive information from your Git repository’s history. Below are two methods for setting it up:

The Automated Method

To streamline the setup, use the following script, which has been tested on Ubuntu 24.04:

# The following script has been written and tested on ubuntu 24.04:
curl -s https://raw.githubusercontent.com/shahinam2/bfg-repo-cleaner-auto-install/main/auto-install.sh -o auto-install.sh && chmod +x auto-install.sh && ./auto-install.sh

You can find the repository for this script at: GitHub Repo.

The Manual Method

  1. Download the BFG Repo-Cleaner:

  2. Install Java:

    • Download and install Java (version 8 or later).

This method provides more control over the installation process and is ideal if you prefer a manual setup.

How to use BFG Repo-Cleaner

⚠️ Important: Before proceeding, ensure you have backed up your repository by cloning it into a separate directory. This will help prevent accidental data loss during the cleanup process.

Suppose your repository contains a file named credentials that holds sensitive information, and this file was committed in commit number 2.

Remove the File

First, delete the file containing sensitive information from your local directory:

rm credentials

Stage and Commit the Deletion

Next, stage the change and create a new commit locally to remove the file:

git add .
git commit -m "remove the credentials file"

This ensures that the sensitive file is no longer part of your working directory or future commits.

Using BFG Repo-Cleaner

Assuming you are already in the directory where BFG Repo-Cleaner is located, use the following command to rewrite the repository history and remove the sensitive file:

java -jar bfg-1.14.0.jar /path/to/your/repo/ --delete-files file-to-remove

Example:

If the file to be removed is named credentials, the command would look like this:

java -jar bfg-1.14.0.jar /home/shahin/bfg-tool-test/ --delete-files credentials

Output:

Using repo : /home/shahin/bfg-tool-test/.git

Found 2 objects to protect
Found 3 commit-pointing refs : HEAD, refs/heads/main, refs/remotes/origin/main

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 7b439b14 (protected by 'HEAD')

Cleaning
--------

Found 4 commits
Cleaning commits:       100% (4/4)
Cleaning commits completed in 30 ms.

Updating 2 Refs
---------------

        Ref                        Before     After   
        ----------------------------------------------
        refs/heads/main          | 7b439b14 | 1597a9c1
        refs/remotes/origin/main | d611e7c6 | e063add1

Updating references:    100% (2/2)
...Ref update completed in 38 ms.

Commit Tree-Dirt History
------------------------

        Earliest      Latest
        |                  |
          .    D    D    m  

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After   
        -------------------------------------------
        First modified commit | ab6b164d | a11e11a6
        Last dirty commit     | d611e7c6 | e063add1

Deleted files
-------------

        Filename      Git id          
        ------------------------------
        credentials | df20d103 (52 B )


In total, 5 object ids were changed. Full details are logged here:

        /home/shahin/bfg-tool-test.bfg-report/2025-01-11/23-01-56

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

What the Output Means:

Protected Commits:

  • BFG preserved the commit 1bd6a8cb because it was "protected by 'HEAD'." Protected commits are typically the latest commits in your repository to ensure no accidental data corruption.

  • If the credentials file exists in this protected commit, you need to clean it manually before running BFG again.

Cleaning Results:

  • BFG identified and cleaned 4 commits in your repository that had the credentials file in their history.

  • It updated 2 references (refs/heads/main and refs/remotes/origin/main) to point to the rewritten history.

Commit Tree-Dirt History:

  • Indicates which commits were modified or cleaned. D (dirty commits) were fixed by BFG.

Deleted Files:

  • BFG successfully found and flagged the credentials file (df20d103 is its Git ID). This indicates the file was removed from the Git history for the rewritten commits.

Steps to Finalize and Verify

Run Garbage Collection:

After running BFG Repo-Cleaner, the process is not fully complete. To finalize the cleanup, you must remove any deleted objects from the Git repository by executing the garbage collection command provided in the output.

Run the following command to clean up your repository:

# Ensure you are in the root directory of your repository before executing this command
git reflog expire --expire=now --all && git gc --prune=now --aggressive

Expected output:

Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (7/7), 612 bytes | 612.00 KiB/s, done.
Total 7 (delta 2), reused 7 (delta 2), pack-reused 0
remote: Resolving deltas: 100% (2/2), done.
To github.com:shahinam2/bfg-tool-test.git
 + d611e7c...1597a9c main -> main (forced update)

This will permanently delete the orphaned objects (e.g., the credentials file) from the repository.

Verify Removal of Sensitive File:

Ensure that the sensitive file, such as credentials, has been completely removed from your Git history by running the following commands:

git log --all --grep="credentials"
git grep "credentials"

If both commands return no results, it means the file has been successfully removed after running BFG and performing garbage collection.

Force Push to Remote: If your repository is hosted on a remote platform (e.g., GitHub), you’ll need to push the rewritten history using a force push:

git push origin --force

⚠️ Important: Force-pushing will overwrite the repository history on the remote server. Be sure to notify all collaborators, as they will need to re-clone the repository to avoid conflicts.

Before Cleanup:

  • The commit history shows commit 2, which contains the sensitive credentials file.

  • The file's content is visible within the repository.

After Cleanup:

  • commit 2 has been rewritten to remove the sensitive content using BFG Repo-Cleaner.

  • The sensitive credentials file is no longer visible in the repository history, as confirmed by the updated commit structure.


Solution 2: Git-Filter-Repo

git-filter-repo offers several advantages over BFG Repo-Cleaner, making it a more versatile and efficient choice for rewriting Git history:

  1. Flexibility:

    Provides comprehensive features for history rewriting, unlike BFG, which focuses on specific tasks like removing sensitive data.

  2. No Java Dependency:

    Python-based and lighter on resources, whereas BFG requires Java Runtime Environment.

  3. No Protected Commits:

    Can modify all commits, including the latest, ensuring complete data removal. BFG protects the latest commit by default, requiring manual cleanup.

  4. Speed:

    Optimized for performance and generally faster for large repositories compared to BFG.

  5. Active Maintenance:

    Regularly updated with detailed documentation, ensuring compatibility with modern Git features.

  6. Customizable Outputs:

    Generates detailed logs and mappings for auditing, unlike BFG’s simpler reporting.

  7. Core Git Integration:

    Relies on core Git functionality, making it portable and easier to integrate into workflows.

  8. Broad Use Cases:

    Versatile enough for various history-rewriting tasks, not limited to removing sensitive data.

How to Use Git-Filter-Repo

Install git-filter-repo:

If not already installed, use the following command:

pip install git-filter-repo

Backup Your Repository:

Before making changes, back up your repository to prevent data loss:

cp -r /path/to/your/repo /path/to/your/repo-backup

# Example:
cp -r /home/shahin/bfg-tool-test /home/shahin/bfg-tool-test-backup

Remove the File:

Run the following command to remove all instances of the credentials file from your Git history:

# Assuming that the credential file is in the current folder, otherwise provide the full path
git filter-repo --sensitive-data-removal --invert-paths --path credentials

-path credentials: Targets the credentials file.

-invert-paths: Removes the targeted file from the repository's history.

This ensures the sensitive file is completely removed from the repository while keeping other data intact.

Force-Push the Updated Repository:

If your repository is hosted on a remote platform (e.g., GitHub), you need to push the rewritten history with a force push:

git push origin --force

Verify Removal:

Run these commands to confirm the credentials file is no longer in the repository history:

git log --all --grep="credentials"
git grep "credentials"

No results indicate successful removal.

Before Cleanup: Commit history shows commit 2, which contains the sensitive credentials file, with its content visible in the repository.

After Cleanup:

After running the cleanup steps, commit 2 has been successfully removed from the history, ensuring the sensitive content is no longer accessible.


Git Guardian: Prevention Over Cure

Avoid the hassle of removing sensitive data from your repository by preventing it from being pushed in the first place. Tools like Git Guardian can monitor and block sensitive information from being added to Git-based platforms.

Visit Git Guardian: https://www.gitguardian.com/


Sources

BFG Repo-Cleaner

Git-Filter-Repo

Photo by Katarina Humajova on Unsplash.

232 views