Reduce repository size
Git repositories become larger over time. When large files are added to a Git repository:
- Fetching the repository becomes slower because everyone must download the files.
- They take up a large amount of storage space on the server.
- Git repository storage limits can be reached.
Rewriting a repository can remove unwanted history to make the repository smaller.
We recommend git filter-repo
over git filter-branch
and
BFG.
Purge files from repository history
To reduce the size of your repository in GitLab, you must first remove references to large files from branches, tags, and other internal references (refs) that are automatically created by GitLab. These refs include:
-
refs/merge-requests/*
for merge requests. -
refs/pipelines/*
for pipelines. -
refs/environments/*
for environments. -
refs/keep-around/*
are created as hidden refs to prevent commits referenced in the database from being removed
These refs are not automatically downloaded and hidden refs are not advertised, but we can remove these refs using a project export.
To purge files from a GitLab repository:
-
Install either
git filter-repo
orgit-sizer
using a supported package manager or from source. -
Generate a fresh export from the project and download it. This project export contains a backup copy of your repository and refs we can use to purge files from your repository.
-
Decompress the backup using
tar
:tar xzf project-backup.tar.gz
This contains a
project.bundle
file, which was created bygit bundle
. -
Clone a fresh copy of the repository from the bundle using
--bare
and--mirror
options:git clone --bare /path/to/project.bundle
-
Go to the
project.git
directory:cd project.git
-
Because cloning from a bundle file sets the
origin
remote to the local bundle file, change it to the URL of your repository:git remote set-url origin https://gitlab.example.com/<namespace>/<project_name>.git
-
Using either
git filter-repo
orgit-sizer
, analyze your repository and review the results to determine which items you want to purge:# Using git filter-repo git filter-repo --analyze head filter-repo/analysis/*-{all,deleted}-sizes.txt # Using git-sizer git-sizer
-
Purge the history of your repository using relevant
git filter-repo
options. Two common options are:-
--path
and--invert-paths
to purge specific files:git filter-repo --path path/to/file.ext --invert-paths
-
--strip-blobs-bigger-than
to purge all files larger than for example 10M:git filter-repo --strip-blobs-bigger-than 10M
See the
git filter-repo
documentation for more examples and the complete documentation. -
-
Because you are trying to remove internal refs, you need the
commit-map
files produced by each run to tell you which internal refs to remove. Everygit filter-repo
run creates a newcommit-map
, and overwrites thecommit-map
from the previous run. You can use the following command to back up eachcommit-map
file:cp filter-repo/commit-map ./_filter_repo_commit_map_$(date +%s)
Repeat this step and all following steps (including the repository cleanup step) every time you run any
git filter-repo
command. -
Force push your changes to overwrite all branches on GitLab:
git push origin --force 'refs/heads/*'
Protected branches cause this to fail. To proceed, you must remove branch protection, push, and then re-enable protected branches.
-
To remove large files from tagged releases, force push your changes to all tags on GitLab:
git push origin --force 'refs/tags/*'
Protected tags cause this to fail. To proceed, you must remove tag protection, push, and then re-enable protected tags.
-
To prevent dead links to commits that no longer exist, push the
refs/replace
created bygit filter-repo
.git push origin --force 'refs/replace/*'
Refer to the Git
replace
documentation for information on how this works. - Wait at least 30 minutes, because the repository cleanup process only processes object older than 30 minutes.
- Run repository cleanup.
Repository cleanup
Introduced in GitLab 11.6.
Repository cleanup allows you to upload a text file of objects and GitLab removes internal Git
references to these objects. You can use
git filter-repo
to produce a list of objects (in a
commit-map
file) that can be used with repository cleanup.
Introduced in GitLab 13.6,
safely cleaning the repository requires it to be made read-only for the duration
of the operation. This happens automatically, but submitting the cleanup request
fails if any writes are ongoing, so cancel any outstanding git push
operations before continuing.
To clean up a repository:
- Go to the project for the repository.
- Go to Settings > Repository.
-
Upload a list of objects. For example, a
commit-map
file created bygit filter-repo
which is located in thefilter-repo
directory.If your
commit-map
file is larger than about 250 KB or 3000 lines, the file can be split and uploaded piece by piece:split -l 3000 filter-repo/commit-map filter-repo/commit-map-
- Select Start cleanup.
This:
- Removes any internal Git references to old commits.
- Runs
git gc --prune=30.minutes.ago
against the repository to remove unreferenced objects. Repacking your repository temporarily causes the size of your repository to increase significantly, because the old pack files are not removed until the new pack files have been created. - Unlinks any unused LFS objects attached to your project, freeing up storage space.
- Recalculates the size of your repository on disk.
GitLab sends an email notification with the recalculated repository size after the cleanup has completed.
If the repository size does not decrease, this may be caused by loose objects being kept around because they were referenced in a Git operation that happened in the last 30 minutes. Try re-running these steps after the repository has been dormant for at least 30 minutes.
When using repository cleanup, note:
- Project statistics are cached. You may need to wait 5-10 minutes to see a reduction in storage utilization.
- The cleanup prunes loose objects older than 30 minutes. This means objects added or referenced in the last 30 minutes
are not removed immediately. If you have access to the
Gitaly server, you may skip that delay and run
git gc --prune=now
to prune all loose objects immediately. - This process removes some copies of the rewritten commits from the GitLab cache and database, but there are still numerous gaps in coverage and some of the copies may persist indefinitely. Clearing the instance cache may help to remove some of them, but it should not be depended on for security purposes!
Storage limits
Repository size limits:
- Can be set by an administrator.
- Can be set by an administrator on self-managed instances.
- Are set for GitLab.com.
When a project has reached its size limit, you cannot:
- Push to the project.
- Create a new merge request.
- Merge existing merge requests.
- Upload LFS objects.
You can still:
- Create new issues.
- Clone the project.
If you exceed the repository size limit, you can:
- Remove some data.
- Make a new commit.
- Push back to the repository.
If these actions are insufficient, you can also:
- Move some blobs to LFS.
- Remove some old dependency updates from history.
Unfortunately, this workflow doesn’t work. Deleting files in a commit doesn’t actually reduce the
size of the repository, because the earlier commits and blobs still exist. Instead, you must rewrite
history. We recommend the open-source community-maintained tool
git filter-repo
.
git gc
runs on the GitLab side, the “removed” commits and blobs still exist. You also
must be able to push the rewritten history to GitLab, which may be impossible if you’ve already
exceeded the maximum size limit.To lift these restrictions, the Administrator of the self-managed GitLab instance must increase the limit on the particular project that exceeded it. Therefore, it’s always better to proactively stay underneath the limit. If you hit the limit, and can’t have it temporarily increased, your only option is to:
- Prune all the unneeded stuff locally.
- Create a new project on GitLab and start using that instead.
Troubleshooting
Incorrect repository statistics shown in the GUI
If the displayed size or commit number is different from the exported .tar.gz
or local repository,
you can ask a GitLab administrator to force an update.
Using the rails console:
p = Project.find_by_full_path('<namespace>/<project>')
pp p.statistics
p.statistics.refresh!
pp p.statistics
# compare with earlier values
# An alternate method to clear project statistics
p.repository.expire_all_method_caches
UpdateProjectStatisticsWorker.perform_async(p.id, ["commit_count","repository_size","storage_size","lfs_objects_size"])
# check the total artifact storage space separately
builds_with_artifacts = p.builds.with_downloadable_artifacts.all
artifact_storage = 0
builds_with_artifacts.find_each do |build|
artifact_storage += build.artifacts_size
end
puts "#{artifact_storage} bytes"