Open In App

Git – Packfiles

Last Updated : 31 Oct, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

The contents of all the items that were deleted from your disc are contained in a single file called the packfile. In order to easily seek a particular object, the index is a file that includes offsets inside that packfile. The resulting packfile is only 7K in size, which is amazing considering that the items on the disc before running the gc command were generally around 15K in size. By packing your objects, you have reduced your disc use by 50%. How does Git achieve this? When Git packs an object, it searches for files with similar names and sizes and only stores the differences between versions of those files. You may view Git’s space-saving actions in the packfile. Using the git verify-pack plumbing command, you may view the contents of the package.

Format of Git-Packfiles

At the top, there is a header that reads as follows:

  • 4-byte signature: The signature is: {‘P’, ‘A’, ‘C’, ‘K’}
  • 4-byte version number (network byte order): Git currently accepts version number 2 or 3 but generates version 2 only.
  • 4-byte number of objects contained in the pack (network byte order)

Observation: We cannot have more than 4G versions 😉 and more than 4G objects in a pack.

The header is followed by the number of object entries, each of which looks like this:

  • (undeltified representation): n-byte type and length (3-bit type, (n-1)*7+4-bit length) compressed data
  • (deltified representation): n-byte type and length (3-bit type, (n-1)*7+4-bit length) base object name if OBJ_REF_DELTA or a negative relative offset from the delta object’s position in the pack if this is an OBJ_OFS_DELTA object compressed delta data
  • Observation: length of each object is encoded in a variable length format and is not constrained to 32-bit or anything.

The directory.git/objects/pack/ houses pack files. This is likely to be empty for brand-new projects because Git initially adds all files as non-packed objects or loose objects. It does this because it’s quite likely that you will rewrite different files (blobs) and directories (trees) when you make changes before committing. In actuality, whenever you use git add to stage a file, a new object is added to the structure of the loose object.

The Packfile Format

The format of the packfile is quite straightforward. A header, a string of packed objects (each with a separate header and content), and a checksum trailer follow. To sort of confirming that you’re getting the start of the packfile correctly, the string “PACK” occupies the first four bytes. A 4-byte pack file version number and a 4-byte number of entries in that file follow this. You might read the header information in Ruby like this:

The Packfile Format

 

It’s vital to remember that the size indicated in the header data refers to the data’s enlarged size rather than the size of the actual data that follows. Since you would normally need to expand each object to determine when the next header begins, the packfile index offsets are really helpful in this situation.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads