The Git object directory contains a pack directory containing packfiles (with suffix ".pack") and pack-indexes (with suffix ".idx"). The pack-indexes provide a way to lookup objects and navigate to their offset within the pack, but these must come in pairs with the packfiles. This pairing depends on the file names, as the pack-index differs only in suffix with its pack- file. While the pack-indexes provide fast lookup per packfile, this performance degrades as the number of packfiles increases, because abbreviations need to inspect every packfile and we are more likely to have a miss on our most-recently-used packfile. For some large repositories, repacking into a single packfile is not feasible due to storage space or excessive repack times.

The multi-pack-index (MIDX for short) stores a list of objects and their offsets into multiple packfiles. It contains:

  • A list of packfile names.

  • A sorted list of object IDs.

  • A list of metadata for the ith object ID including:

  • A value j referring to the jth packfile.

  • An offset within the jth packfile for the object.

  • If large offsets are required, we use another list of large offsets similar to version 2 pack-indexes.

Thus, we can provide O(log N) lookup time for any number of packfiles.

Design Details

Future Work

[0] https://bugs.chromium.org/p/git/issues/detail?id=6 Chromium work item for: Multi-Pack Index (MIDX)

[1] https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/ An earlier RFC for the multi-pack-index feature

[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/ Git Merge 2018 Contributor’s summit notes (includes discussion of MIDX)