Open In App

How to Handle Big Repositories with Git?

Last Updated : 28 Feb, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Git is a free and open-source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Git relies on the basis of distributed development of software where more than one developer may have access to the source code of a specific application and can modify changes to it that may be seen by other developers. In this article, we will learn how to handle big repositories with Git. There are two types of big repositories:

  1. One with a large commit history
  2. Another with a large number of binary files

Handling repositories with large commit history:

  1. Use shallow clone
  2. Using git-filter
  3. Cloning a single branch

1. Using the shallow clone

This is a comparatively fast solution where we pull down only the latest commits of the repo’s history. Imagine I have a repository with 1 GB of data with more than 35000+ commits. If I choose full cloning this repository, it’s general it will take a large amount of time, but if we choose to pull only the latest n commits it can reduce our time exponentially. To perform shallow cloning we need to add –depth command with our clone command

git clone --depth [n] [url]
Here n specifies number of latest n commits
url specifies the remote url of the repository

2. Using git-filter

Here we can walk through the entire project history, modify, filter, or skip according to our necessity. This is generally used when we do have a large number of binary files and we need only some. To use git-filter we use the following command:

git filter-branch --tree-filter 'rm -rf [path-to-asset]'
path-to-asset signifies the path to binary asset in your repository

Although powerful, it comes with its own shortcoming that whenever we do git-filter, it changes the ids of the commit which will further require recloning. Therefore required care of recloning must be taken while using git-filter

3. Cloning a single branch

This technique is useful when we do have multiple branches but we want to work with some of them. To clone a single branch, we can use the following command:

git clone [url] --branch [branch_name] --single-branch
url specifies the remote url of the repository
branch_name specifies the name of the branch you want to clone

Handling repositories with a large number of binary files:

  1. We can use submodules, i.e. repository inside another repository. The inside repository will contain all the binary files which will provide us modularity since it will keep parent code separately and if in the future we want to make changes in this sub-module it will not affect the parent code repository.
  2. We can use third-party extensions like Git LFS, a Git extension used to manage large files and binary files in a separate Git repository.
  3. We can use garbage collection git-gc which does turn several loose objects into a single file.

Conclusion

Out of all the above three solutions, using third-party extensions like Git LFS is the most recommended.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads