February 1, 2023

How to Remove Sensitive Data From a Git History

Table of Contents

You’re not alone if you’re concerned about accidentally exposing sensitive files. Nowadays, sensitive data may include encryption keys, deployment config files, SSH keys, API keys, authorization tokens, and connection strings. These secrets are considered compromised if they are pushed to a remote git repository, even a private one.

Note that an accidentally leaked Split by Harness Admin API key or SDK key can be instantly revoked in the Web Console, in Admin settings under API keys. An Admin can also easily create a replacement key from the same Web Console page.

If you have a GitHub repository containing credentials that cannot be revoked, you can contact GitHub Support to permanently remove cached views of your GitHub-hosted repository and references to sensitive data in pull requests on github.com.

Additionally, you can fully purge sensitive files or secrets from your entire Git repository commit history using the following tools:

  • git-filter-repo
  • BFG Repo-Cleaner

To prepare to remove sensitive strings from your Git history using one of these tools, you can first create a text file, that we’ll name replacements.txt. Each line in the file should be a sensitive string to be removed, optionally followed by an arrow (==>) and the replacement text, as shown below.

password001
 cigfkkdmgnl6jrfmbkqd0luaho54l9bbs==>process.env.SDK_KEY
 bob@split.io==>support@split.io
 password002==>[PASSWORD]

You will need to rewrite all the Git commits that contain sensitive data in your repository history and prune empty commits that result. This process of removing your secrets creates new commits with new SHA-1 ids. Two methods that accomplish this are described below.

Method One: Using the git-filter-repo tool

The git-filter-repo tool can remove sensitive information and large files (blobs) from your entire Git repository history, not just your last commit. It is a very flexible, open source tool hosted on GitHub and the recommended replacement for git-filter-branch.

git-filter-repo is a single file that you can download and place on your $PATH and that requires Python to be installed on your system. There are also package managers, like pip, that can install it.

You can examine the exhaustive user manual page for git-filter-repo, but for our purposes here are the key steps:

  1. Start with a local clone of your Git repository and create a backup, so that you don’t lose your secrets and some settings (remotes) in the .git/config file.
  2. Checkout all your branches.
  3. Remove sensitive data in your entire Git repository history with the following command:
git filter-repo --replace-text replacements.txt --replace-refs delete-no-add

You can also remove a file with sensitive data from your commit history with the command:

git filter-repo --invert-paths --path <path-to-sensitive-file> --replace-refs delete-no-add

The —replace-refs delete-no-add directive tells Git not to create replace-references for the deleted commits.

  1. Force-push to your remote repo using the --all flag to update all your branches, and again using the --tags flag to update all your tagged releases.

These git-filter-repo steps can also be accomplished using BFG Repo-Cleaner.

Method Two: Using the BFG Repo-Cleaner tool

BFG Repo-Cleaner is also an open source tool hosted on GitHub. It is simpler, faster alternative to git filter-repo (or git filter-branch) for removing large files and sensitive data from Git repositories.

The BFG Repo-Cleaner is bundled as a downloadable jar file and uses the Java runtime environment to run on the command line.

The full instructions are on the main documentation page, but quick overview of the usage is below.

  1. Clean up sensitive data from your source code and commit your changes to ensure that your latest commit is clean.
  2. Clone a bare Git repository using the git clone --mirror command and create a backup. The --mirror flag instructs Git to pull all the repository’s references.
  3. Remove sensitive data in your entire Git repository history with the following command:
java -jar bfg.jar --replace-text replacements.txt repo.git

You can also purge sensitive files from your commit history with the command:

java -jar bfg.jar --delete-files <filename> repo.git
  1. Push your changes to your remote repository, using git push, to update your commits and references.

Following this clean-up of the remote Git repository, your teammates can use the git pull command to pull the clean commits to their local repositories and the git pull --tags command to locally update their tagged commits. Instead of merging, your colleagues should git rebase their working branches.Follow-Up Steps in GitGit references the deleted commits in the reflog and retains them in the database as dangling commits for a time. These can be removed manually using the following commands:

git reflog expire --expire=now --all
 git gc --prune=now --aggressive

The first command deletes the reflog of your local repository, the history of your Git repository’s HEAD pointer. The second command cleans up the dangling commits in your Git repository database.

Generally, it is best practice to add sensitive file paths to a .gitignore file at the root of your Git repository. This will prevent the sensitive files from being added to the Git index, the mechanism by which Git tracks file changes. Files matching .gitignore entries will not be staged by the git add command without the --force flag. Additionally, a file can be removed from the Git index using the git rm --cached command, which operates on the current branch. This requires a follow-up git commit, and the index is updated from that commit forward.

No items found.
No items found.
No items found.