- 
                Notifications
    You must be signed in to change notification settings 
- Fork 157
[RFC] extensions.refFormat and packed-refs v2 file format #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] extensions.refFormat and packed-refs v2 file format #1408
Conversation
The hashfile API is useful for generating files that include a trailing hash of the file's contents up to that point. Using such a hash is helpful for verifying the file for corruption-at-rest, such as a faulty drive causing flipped bits. Since the commit-graph and multi-pack-index files both use this trailing hash, the chunk-format API uses a 'struct hashfile' to handle the I/O to the file. This was very convenient to allow using the hashfile methods during these operations. However, hashing the file contents during write comes at a performance penalty. It's slower to hash the bytes on their way to the disk than without that step. If we wish to use the chunk-format API to upgrade other file types, then this hashing is a performance penalty that might not be worth the benefit of a trailing hash. For example, if we create a chunk-format version of the packed-refs file, then the file format could shrink by using raw object IDs instead of hexadecimal representations in ASCII. That reduction in size is not enough to counteract the performance penalty of hashing the file contents. In cases such as deleting a reference that appears in the packed-refs file, that write-time performance is critical. This is in contrast to the commit-graph and multi-pack-index files which are mainly updated in non-critical paths such as background maintenance. One way to allow future chunked formats to not suffer this penalty would be to create an abstraction layer around the 'struct hashfile' using a vtable of function pointers. This would allow placing a different representation in place of the hashfile. This option would be cumbersome for a few reasons. First, the hashfile's buffered writes are already highly optimized and would need to be duplicated in another code path. The second is that the chunk-format API calls the chunk_write_fn pointers using a hashfile. If we change that to an abstraction layer, then those that _do_ use the hashfile API would need to change all of their instances of hashwrite(), hashwrite_be32(), and others to use the new abstraction layer. Instead, this change opts for a simpler change. Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the update_fn and final_fn members of the_hash_algo are skipped. When finalizing the hashfile, the trailing hash is replaced with the null hash. This use of a trailing null hash would be desireable in either case, since we do not want to special case a file format to have a different length depending on whether it was hashed or not. When the final bytes of a file are all zero, we can infer that it was written without hashing, and thus that verification is not available as a check for file consistency. This also means that we could easily toggle hashing for any file format we desire. For the commit-graph and multi-pack-index file, it may be possible to allow the null hash without incrementing the file format version, since it technically fits the structure of the file format. The only issue is that older versions would trigger a failure during 'git fsck'. For these file formats, we may want to delay such a change until it is justified. However, the index file is written in critical paths. It is also frequently updated, so corruption at rest is less likely to be an issue than in those other file formats. This could be a good candidate to create an option that skips the hashing operation. A version of this patch has existed in the microsoft/git fork since 2017 [1] (the linked commit was rebased in 2018, but the original dates back to January 2017). Here, the change to make the index use this fast path is delayed until a later change. [1] microsoft@21fed2d Co-authored-by: Kevin Willford <[email protected]> Signed-off-by: Kevin Willford <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. Following a similar approach to one used in the microsoft/git fork [1], add a new config option that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] microsoft@21fed2d While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.computHash=false on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. Signed-off-by: Derrick Stolee <[email protected]>
Git's reference storage is critical to its function. Creating new storage formats for references requires adding an extension. This prevents third-party tools that do not understand that format from operating incorrectly on the repository. This makes updating ref formats more difficult than other optional indexes, such as the commit-graph or multi-pack-index. However, there are a number of potential ref storage enhancements that are underway or could be created. Git needs an established mechanism for coordinating between these different options. The first obvious format update is the reftable format as documented in Documentation/technical/reftable.txt. This format has much of its implementation already in Git, but its connection as a ref backend is not complete. This change is similar to some changes within one of the patches intended for the reftable effort [1]. [1] https://lore.kernel.org/git/[email protected]/ However, this change makes a distinct strategy change from the one recommended by reftable. Here, the extensions.refFormat extension is provided as a multi-valued list. In the reftable RFC, the extension has a single value, "files" or "reftable" and explicitly states that this should not change after 'git init' or 'git clone'. The single-valued approach has some major drawbacks, including the idea that the "files" backend cannot coexist with the "reftable" backend at the same time. In this way, it would not be possible to create a repository that can write loose references and combine them into a reftable in the background. With the multi-valued approach, we could integrate reftable as a drop-in replacement for the packed-refs file and allow that to be a faster way to do the integration since the test suite would only need updates when the test is explicitly testing packed-refs. When upgrading a repository from the "files" backend to the "reftable" backend, it can help to have a transition period where both are present, then finally removing the "files" backend after all loose refs are collected into the reftable. But the reftable is not the only approach available. One obvious improvement could be a new file format version for the packed-refs file. Its current plaintext-based format is inefficient due to storing object IDs as hexadecimal representations instead of in their raw format. This extra cost will get worse with SHA-256. In addition, binary searches need to guess a position and scan to find newlines for a refname entry. A structured binary format could allow for more compact representation and faster access. Adding such a format could be seen as "files-v2", but it is really "packed-v2". The reftable approach has a concept of a "stack" of reftable files. This idea would also work for a stack of packed-refs files (in v1 or v2 format). It would be helpful to describe that the refs could be stored in a stack of packed-ref files independently of whether that is in file format v1 or v2. Even in these two options, it might be helpful to indicate whether or not loose ref files are present. That is one reason to not make them appear as "files-v2" or "files-v3" options in a single-valued extension. Even as "packed-v2" or "packed-v3" options, this approach would require third-party tools to understand the "v2" version if they want to support the "v3" options. Instead, by splitting the format from the layout, we can allow third-party tools to integrate only with the most-desired format options. For these reasons, this change is defining the extensions.refFormat extension as well as how the two existing values interact. By default, Git will assume "files" and "packed" in the list. If any other value is provided, then the extension is marked as unrecognized. Add tests that check the behavior of extensions.refFormat, both in that it requires core.repositoryFormatVersion=1, and Git will refuse to work with an unknown value of the extension. There is a gap in the current implementation, though. What happens if exactly one of "files" or "packed" is provided? The presence of only one would imply that the other is not available. A later change can communicate the list contents to the repository struct and then the reference backend could ignore one of these two layers. Specifically, having only "files" would mean that Git should not read or write the packed-refs file and instead only read and write loose ref files. By contrast, having only "packed" would mean that Git should not read or write loose ref files and instead always update the packed-refs file on every ref update. Signed-off-by: Derrick Stolee <[email protected]>
The documentation for 'extensions.worktreeConfig' includes a bulletted list describing certain config values that need to be moved into the worktree config instead of the repository config file. However, since we are already in a bulletted list, the documentation tools do not know when that inner list is complete. Paragraphs intended to not be part of that inner list are rendered as part of the last bullet. Modify the format to match a similar doubly-nested list from the 'column.ui' config documentation. Reword the descriptions slightly to make the config keys appear as their own heading in the inner list. Signed-off-by: Derrick Stolee <[email protected]>
The previous change introduced the extensions.refFormat config option. It is a multi-valued config option that currently understands "files" and "packed", with both values assumed by default. If any value is provided explicitly, this default is ignored and the provided settings are used instead. The multi-valued nature of this extension presents a way to allow a user to specify that they never want a packed-refs file (only use "files") or that they never want loose reference files (only use "packed"). However, that functionality is not currently connected. Before actually modifying the files backend to understand these extension settings, do the basic wiring that connects the extensions.refFormat parsing to the creation of the ref backend. A future change will actually change the ref backend initialization based on these settings, but this communication of the extension is sufficiently complicated to be worth an isolated change. For now, also forbid the setting of only "packed". This is done by redirecting the choice of backend to the packed backend when that selection is made. A later change will make the "files"-only extension value ignore the packed backend. Signed-off-by: Derrick Stolee <[email protected]>
The extensions.refFormat extension is a multi-valued config that specifies which ref formats are available to the current repository. By default, Git assumes the list of "files" and "packed", unless there is at least one of these extensions specified. With the current values, it is possible for a user to specify only "files" or only "packed". The only-"packed" option was already ruled as invalid since Git's current code has too many places that require a loose reference. This could change in the future. However, we can now allow the user to specify extensions.refFormat=files alone, making it impossible to create a packed-refs file (or to read one that might exist). Signed-off-by: Derrick Stolee <[email protected]>
Even though the commit-graph and multi-pack-index file formats specify a number of chunks in their header information, this is optional. The table of contents terminates with a null chunk ID, which can be used instead. The extra value is helpful for some checks, but is ultimately not necessary for the format. This will be important in some future formats. Signed-off-by: Derrick Stolee <[email protected]>
It will be helpful to allow a trailing table of contents when writing some file types with the chunk-format API. The main reason is that it allows dynamically computing the chunk sizes while writing the file. This can use fewer resources than precomputing all chunk sizes in advance. Signed-off-by: Derrick Stolee <[email protected]>
As a preparatory step to allowing trailing table of contents, store the offsets of each chunk as we write them. This replaces an existing use of a local variable, but the stored value will be used in the next change. Signed-off-by: Derrick Stolee <[email protected]>
The existing chunk formats use the table of contents at the beginning of the file. This is intended as a way to speed up the initial loading of the file, but comes at a cost during writes. Each example needs to fully compute how big each chunk will be in advance, which usually requires storing the full file contents in memory. Future file formats may want to use the chunk format API in cases where the writing stage is critical to performance, so we may want to stream updates from an existing file and then only write the table of contents at the end. Add a new 'flags' parameter to write_chunkfile() that allows this behavior. When this is specified, the defensive programming that checks that the chunks are written with the precomputed sizes is disabled. Then, the table of contents is written in reverse order at the end of the hashfile, so a parser can read the chunk list starting from the end of the file (minus the hash). The parsing of these table of contents will come in a later change. Signed-off-by: Derrick Stolee <[email protected]>
The new read_trailing_table_of_contents() mimics read_table_of_contents() except that it reads the table of contents in reverse from the end of the given hashfile. The file is given as a memory-mapped section of memory and a size. Automatically calculate the start of the trailing hash and read the table of contents in revers from that position. The errors come along from those in read_table_of_contents(). The one exception is that the chunk_offset cannot be checked as going into the table of contents since we do not have that length automatically. That may have some surprising results for some narrow forms of corruption. However, we do still limit the size to the size of the file plus the part of the table of contents read so far. At minimum, the given sizes can be used to limit parsing within the file itself. Signed-off-by: Derrick Stolee <[email protected]>
In preparation for adding a new packed-refs file format, extract all code from refs/packed-backend.c that involves knowledge of the plaintext file format. This includes any parsing logic that cares about the header, plaintext lines of the form "<oid> <ref>" or "^<peeled>", and the error messages when there is an issue in the file. This also includes the writing logic that writes the header or the individual references. Future changes will perform more refactoring to abstract away more of the writing process to be more generic, but this is enough of a chunk of code movement. Signed-off-by: Derrick Stolee <[email protected]>
The write_with_updates() method uses a write_error label to jump to code that adds an error message before exiting with an error. This appears both when the packed-refs file header is written, but also when a ref line is written to the packed-refs file. A future change will abstract the loop that writes the refs out of write_with_updates(), making the goto an inconvenient pattern. For now, remove the distinction between "goto write_error" and "goto error" by adding the message in-line using the new static method add_write_error(). This is functionally equivalent, but will make the next step easier. Signed-off-by: Derrick Stolee <[email protected]>
TBD Signed-off-by: Derrick Stolee <[email protected]>
The packed-refs file is a plaintext file format that starts with a header line, then each ref is given as one or two lines (two if there is a peeled value). These lines are written as part of a sequence of updates which are merged with the existing ref iterator in merge_iterator_and_updates(). That method is currently tied directly to write_packed_entry_v1(). When creating a new version of the packed-file format, it would be valuable to use this merging logic in an identical way. Create a new function pointer type, write_ref_fn, and use that type in merge_iterator_and_updates(). Notably, the function pointer type no longer depends on a FILE pointer, but instead takes an arbitrary "void *write_data" parameter. This flexibility will be critical in the future, since the planned v2 format will use the chunk-format API and need a more complicated structure than the output FILE. Signed-off-by: Derrick Stolee <[email protected]>
When updating the file format version for something as critical as ref storage, the file format version must come with an extension change. The extensions.refFormat config value is a multi-valued config value that defaults to the pair "files" and "packed". Add "packed-v2" as a possible value to extensions.refFormat. This value specifies that the packed-refs file may exist in the version 2 format. (If the "packed" value does not exist, then the packed-refs file must exist in version 2, not version 1.) In order to select version 2 for writing, the user will have two options. First, the user could remove "packed" and add "packed-v2" to the extensions.refFormat list. This would imply that version 2 is the only format available. However, this also means that version 1 files would be ignored at read time, so this does not allow users to upgrade repositories with existing packed-refs files. Add a new refs.packedRefsVersion config option which allows specifying which version to use during writes. Thus, when both "packed" and "packed-v2" are in the extensions.refFormat list, the user can upgrade from version 1 to version 2, or downgrade from 2 to 1. Currently, the implementation does not use refs.packedRefsVersion, as that is delayed until we have the code to write that file format version. However, we can add the necessary enum values and flag constants to communicate the presence of "packed-v2" in the extensions.refFormat list. Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
TODO: add writing tests. Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
Tests already cover that we will start reading these prefixes. TODO: discuss time and space savings over typical approach. Signed-off-by: Derrick Stolee <[email protected]>
When set, this will create a default value for the packed-refs file version on writes. When set to "2", it will automatically add the "packed-v2" value to extensions.refFormat. Not all tests pass with GIT_TEST_PACKED_REFS_VERSION=2 because they care specifically about the content of the packed-refs file. These tests will be updated in following changes. To start, though, disable the GIT_TEST_PACKED_REFS_VERSION environment variable in t3212-ref-formats.sh, since that script already tests both versions, including upgrade scenarios. Signed-off-by: Derrick Stolee <[email protected]>
t1409-avoid-packing-refs.sh seeks to test that the packed-refs file is not modified unnecessarily. One way it does this is by creating a packed-refs file, then munging its contents and verifying that the munged data remains after other commands. For packed-refs v1, it suffices to add a line that is similar to a comment. For packed-refs v2, we cannot even add to the file without messing up the trailing table of contents of its chunked format. However, we can manipulate the last bytes that are within the trailing hash and use 'tail -c 4' to read them. This makes t1409 pass with GIT_TEST_PACKED_REFS_VERSION=2. Signed-off-by: Derrick Stolee <[email protected]>
One test in t5312 uses 'grep' to detect that a ref is written in the packed-refs file instead of a loose object. This does not work when the packed-refs file is in v2 format, such as when GIT_TEST_PACKED_REFS_VERSION=2. Since the test already checks that the loose ref is missing, it suffices to check that 'git rev-parse' succeeds. Signed-off-by: Derrick Stolee <[email protected]>
The last test in t5502-quickfetch.sh exploits the packed-refs v1 file format by appending 1000 lines to the packed-refs file. If the packed-refs file is in the v2 format, this corrupts the file as unreadable. Instead of making the test slower, let's ignore it when GIT_TEST_PACKED_REFS_VERSION=2. The test is really about 'git fetch', not the packed-refs format. Create a prerequisite in case we want to use this technique again in the future. An alternative would be to write those 1000 refs using a different mechanism, but let's opt for the simpler case for now. Signed-off-by: Derrick Stolee <[email protected]>
Three tests in t3210-pack-refs.sh corrupt a packed-refs file to test that Git properly discovers and handles those failures. These tests assume that the file is in the v1 format, so add the PACKED_REFS_V1 prereq to skip these tests when GIT_TEST_PACKED_REFS_VERSION=2. Signed-off-by: Derrick Stolee <[email protected]>
The GIT_TEST_PACKED_REFS_VERSION=2 environment variable helps us test the packed-refs file format in its v2 version. This variable makes the Git process act as if the extensions.refFormat config key has "packed-v2" in its list. This means that if the environment variable is removed, the repository is in a bad state. This is sufficient for most test cases. However, tests that fetch over HTTP appear to lose this environment variable when executed through the HTTP server. Since the repositories are created via Git commands in the tests, the packed-refs files end up in the v2 format, but the server processes do not understand this and start serving empty payloads since they do not recognize any refs. The preferred long-term solution would be to ensure that the GIT_TEST_* environment variable persists into the HTTP server. However, these tests are not exercising any particularly tricky parts of the packed-refs file format. It may not be worth the effort to pass the environment variable and instead we can unset the environment variable (with a comment explaining why) in these tests. Signed-off-by: Derrick Stolee <[email protected]>
The linux-TEST-vars CI build helps us check that certain opt-in features are still exercised in at least one environment. The new GIT_TEST_PACKED_REFS_VERSION environment variable now passes the test suite when set to "2", so add this to that list of variables. This provides nearly the same coverage of the v2 format as we had in the v1 format. Signed-off-by: Derrick Stolee <[email protected]>
| On the Git mailing list, Phillip Wood wrote (reply to this): Hi Stolee
On 30/11/2022 15:16, Derrick Stolee wrote:
> On 11/28/2022 1:56 PM, Han-Wen Nienhuys wrote:
>> * Worktrees and the main repository have a separate view of the ref
>> namespace. This is not explicit in the ref backend API, and there is a
>> technical limitation that the packed-refs file cannot be in a
>> worktree. This means that worktrees will always continue to use
>> loose-ref storage if you only extend the packed-refs backend.
> > If I'm understanding it correctly [1], only the special refs (like HEAD or
> REBASE_HEAD) are worktree-specific, and all refs under "refs/*" are
> repository-scoped. I don't actually think of those special refs as "loose"
> refs and thus they should still work under the "only packed-refs" value
> for extensions.refFormat. I should definitely cover this in the
> documentation, though. Also, [1] probably needs updating because it calls
> HEAD a pseudo ref even though it explicitly is not [2].
>
> [1] https://git-scm.com/docs/git-worktree#_refs
> [2] https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpseudorefapseudoref
Unfortunately I think it is a little messier than that (see refs.c:is_per_worktree_ref()). refs/bisect/*, refs/rewritten/* and refs/worktree/* are all worktree specific.
Best Wishes
Phillip | 
| User  | 
| On the Git mailing list, Taylor Blau wrote (reply to this): On Wed, Nov 30, 2022 at 10:16:52AM -0500, Derrick Stolee wrote:
> > Do you mean 'enumerate refs' ? Why would you want to count refs by prefix?
>
> * There was a GitHub feature that counted refs and tags, but wanted to
>   ignore internal ref prefixes (outside of refs/heads/* or refs/tags/*).
>   It turns out that we didn't actually need the full count but an
>   existence indicator, but it would be helpful to quickly identify how
>   many branches or tags are in a repository at a glance. Packed-refs v1
>   requires scanning the whole file while packed-refs v2 does a fixed
>   number of binary searches followed by a subtraction of row indexes.
True. On the surface, it seemed odd to use a function which returns
something like:
    { "refs/heads/*" => NNNN, "refs/tags/*" => MMMM }
only to check whether or not NNNN and MMMM are zero or non-zero.
But there's a little more to the story. That emptiness check does occur
at the beginning of many page loads. But when it responds "non-empty",
we then care about how many branches and tags there actually are.
So calling count_refs() (the name of the internal RPC that powers all of
this) was an optimization written under the assumption that we actually
are going to ask about the exact number of branches/tags very shortly
after querying for emptiness.
It turns out that empirically it's faster to do something like:
    $ git show-ref --[heads|tags] | head -n 1
to check if there are any branches and tags at all[^1], and then a
follow up 'git show-ref --heads | wc -l' to check how many there are.
But it would be nice to do both operations quickly without having
actually scan all of the entries in each prefix.
Thanks,
Taylor
[^1]: Some may remember my series in
  https://lore.kernel.org/git/[email protected]/
  which replaced '| head -n 1' with a '--count=1' option. This matches
  what GitHub runs in production where piping one command to another
  from Ruby is unfortunately quite complicated. | 
| On the Git mailing list, Han-Wen Nienhuys wrote (reply to this): On Wed, Nov 30, 2022 at 4:16 PM Derrick Stolee <[email protected]> wrote:
> > * Symrefs are refs too, but for some reason the packed-refs file
> > doesn't support them. Does packed-refs v2 support symrefs too?  If you
> > want to snapshot the state of refs, do you want to snapshot the value
> > of HEAD too?
>
> I forgot that loose refs under .git/refs/ can be symrefs. This definitely
> is a limitation that I should mention. Again, pseudorefs like HEAD are not
> included and are stored separately, but symrefs within refs/* are not
> available in packed-refs (v1 or v2). That should be explicitly called out
> in the extensions.refFormat docs.
>
> I imagine that such symrefs are uncommon, and users can make their own
> evaluation of whether that use is worth keeping loose refs or not. We can
> still have the {files, packed[-v2]} extension value while having a
> writing strategy that writes as much as possible into the packed layer.
To be honest, I don't understand why symrefs are such a generic
concept; I've only ever seen them used for HEAD.
> > * By not changing reflogs, you are making things simpler. (if a
> > transaction updates the branch that HEAD points to, the reflog for
> > HEAD has to be updated too. Because reftable updates the reflog
> > transactionally, this was some extra work)
> > Then again, I feel the current way that reflogs work are a bit messy,
> > because directory/file conflicts force reflogs to be deleted at times
> > that don't make sense from a user-perspective.
>
> I agree that reflogs are messy. I also think that reflogs have different
> needs than the ref storage, so separating their needs is valuable.
If the reflog records the history of the ref database, then ideally,
an update of a ref should be transactional across the ref database and
the reflog. I think you can never make this work unless you tie the
storage of both together.
I can't judge how many hosting providers really care about this. At
google, we really care, but we keep the ref database and the refllog
in a global Spanner database. Reftable is only used for per-datacenter
serving. (I discovered some bugs in the JGit reflog code when I ported
it to local filesystem repos, because it was never exercised at
Google)
> > * There are a lot of commands that store SHA1s in files under .git/,
> > and access them as if they are a ref (for example: rebase-apply/ ,
> > CHERRY_PICK_HEAD etc.).
>
> Yes, I think these pseudorefs are stored differently from usual refs, and
> hence the {packed[-v2]} extension value would still work, but I'll confirm
> this with more testing.
They will work as long as you keep support for loose refs, because
there is no distinction between "a entry in the ref database" and "any
file randomly written into .git/ ".
> >> In this RFC, I propose a different model that allows for more customization
> >> and incremental updates. The extensions.refFormat config key is multi-valued
> >> and defaults to the list of files and packed. In the context of this RFC,
> >> the intention is to be able to add packed-v2 so the list of all three values
> >> would allow Git to write and read either file format version (v1 or v2). In
> >> the larger scheme, the extension could allow restricting to only loose refs
> >> (just files) or only packed-refs (just packed) or even later when reftable
> >> is complete, files and reftable could mean that loose refs are the primary
> >> ref storage, but the reftable format serves as a drop-in replacement for the
> >> packed-refs file. Not all combinations need to be understood by Git, but
> >
> > I'm not sure how feasible this is. reftable also holds reflog data. A
> > setting {files,reftable} would either not work, or necessitate hairy
> > merging of data to get the reflogs working correctly.
>
> In this setup, would it be possible to continue using the "loose reflog"
> format while using reftable as the packed layer? I personally think this
> combination of formats to be critical to upgrading existing repositories
> to reftable.
I suppose so? If you only store refs and tags (and don't handle
reflogs, symrefs or use the inverse object mapping) then the reftable
file format is just a highly souped-up version of packed-refs.
> (Note: there is a strategy that doesn't need this approach, but it's a bit
> complicated. It would involve rotating all replicas to new repositories
> that are configured to use reftable upon creation, getting the refs from
> other replicas via fetches. In my opinion, this is prohibitively
> expensive.)
I'm not sure I understand the problem. Any deletion of a ref (that is
in packed-refs) today already requires rewriting the entire
packed-refs file ("all or nothing" operation). Whether you write a
packed-refs or reftable is roughly equally expensive.
Are you looking for a way to upgrade a repo, while concurrent git
process may write updates into the repository during the update? That
may be hard to pull off, because you probably need to rename more than
one file atomically. If you accept momentarily failed writes, you
could do
* rename refs/ to refs.old/ (loose ref writes will fail now)
* collect loose refs under refs.old/ , put into packed-refs
* populate the reftable/ dir
* set refFormat extension.
* rename refs.old/ to refs/ with a refs/heads a file (as described in
the reftable spec.)
See also https://gerrit.googlesource.com/jgit/+/ca166a0c62af2ea87fdedf2728ac19cb59a12601/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/FileRepository.java#734
> >> I mentioned earlier that I had considered using reftable as a way to achieve
> >> the stated goals. With the current state of that work, I'm not confident
> >> that it is the right approach here.
> >>
> >> My main worry is that the reftable is more complicated than we need for a
> >> typical Git repository that is based on a typical filesystem. This makes
> >> testing the format very critical, and we seem to not be near reaching that
> >> approach.
> >
> > I think the base code of reading and writing the reftable format is
> > exercised quite exhaustively tested in unit tests. You say 'seem', but
> > do you have anything concrete to say?
>
> Our test suite is focused on integration tests at the command level. While
> unit tests are helpful, I'm not sure if all of the corner cases would be
> covered by tests that check Git commands only.
It's actually easier to test all of the nooks of the format through
unittests, because you can tweak parameters (eg. blocksize) that
aren't normally available in the command-line
> >> That might even help the integration process to allow the reftable format to
> >> be tested at least by some subset of tests instead of waiting for a full
> >> test suite update.
> >
> > I don't understand this comment. In the current state,
> > https://github.com/git/git/pull/1215 already passes 922 of the 968
> > test files if you set GIT_TEST_REFTABLE=1.
> >
> > See https://github.com/git/git/pull/1215#issuecomment-1329579459 for
> > details. As you can see, for most test files, it's just a few
> > individual test cases that fail.
>
> My point is that to get those remaining tests passing requires a
> significant update to the test suite. I imagined that the complexity of
> that update was the blocker to completing the reftable work.
>
> It seems that my estimation of that complexity was overly high compared to
> what you appear to be describing.
To be honest, i'm not quite sure how significant the work is: for
things like worktrees, it wasn't that obvious to me how things should
work in the first place. That makes it hard to make estimates. I
thought there might be a month of full-time work left, but these days
I can barely make a couple of hours of time per week to work on
reftable  if at all.
> > For deleting refs quickly, it seems that you only need to support
> > $ZEROID in packed-refs and then implement a ref database as a stack of
> > packed-ref files? If you're going for minimal effort and minimal
> > disruption wouldn't that be the place to start?
>
> I disagree that jumping straight to stacked packed-refs is minimal effort
> or minimal disruption.
>
> Creating the stack approach does require changing the semantics of the
> packed-refs format to include $ZEROID, which will modify some meanings in
> the iteration code. The use of a stack, as well as how layers are combined
> during a ref write or also during maintenance, adds complications to the
> locking semantics that are decently complicated.
>
> By contrast, the v2 format is isolated to the on-disk format. None of the
> writing or reading semantics are changed in terms of which files to look
> at or write in which order. Instead, it's relatively simple to see from
> the format exactly how it reduces the file size but otherwise has exactly
> the same read/write behavior. In fact, since the refs and OIDs are all
> located in the same chunk in a similar order to the v1 file, we can even
> deduce that page cache semantics will only improve in the new format.
>
> The reason to start with this step is that the benefits and risks are
> clearly understood, which can motivate us to establish the mechanism for
> changing the ref format by defining the extension.
I believe that the v2 format is a safe change with performance
improvements, but it's a backward incompatible format change with only
modest payoff. I also don't understand how it will help you do a stack
of tables,
which you need for your primary goal (ie. transactions/deletions
writing only the delta, rather than rewriting the whole file?).
> > You're concerned about the reftable file format (and maybe rightly
> > so), but if you're changing the file format anyway and you're not
> > picking reftable, why not create a block-based, indexed format that
> > can support storing reflog entries at some point in the future too,
> > rather than build on (the limitations) of packed-refs?
>
> My personal feeling is that storing ref tips and storing the history of a
> ref are sufficiently different problems that should have their own data
> structures. Even if they could be combined by a common format, I don't
> think it is safe to transition every part of every ref operation to a new
> format all at once.
>
> Looking at reftable from the perspective of a hosting provider, I'm very
> hesitant to recommend transitioning to it because of how it is an "all or
> nothing" switch. It does not fit with my expectations for safe deployment
> practices.
You'd have to consult with your SRE team, how to do this best, but
here's my $.02. If you are a hosting provider, I assume you have 3 or
5 copies of each repo in diffrent datacenters for
redundancy/availability. You could have one of the datacenters use the
new format for while, and see if there are any errors or discrepancies
(both in terms of data consistency and latency metrics)
> * The reftable work needs its refs backend implemented, but your draft PR
>   has a prototype of this and some basic test suite integration. There are
>   54 test files that have one or more failing tests, and likely these just
>   need to be adjusted to not care about loose references.
>
> * The reftable is currently fundamentally different enough that it could
>   not be used as a replacement for the packed-refs file underneath loose
>   refs (primarily due to its integration with the reflog). Doing so would
>   require significant work on top of your prototype.
It could, but I don't see the point.
> I'm going to take the following actions on my end to better understand the
> situation:
>
> 1. I'll take your draft PR branch and do some performance evaluations on
>    the speed of ref updates compared to loose refs and my prototype of a
>    two-stack packed-ref where the second layer of the stack is only for
>    deleted refs.
(tangent) - wouldn't that design perform poorly once the number of
deletions gets large? You'd basically have to rewrite the
deleted-packed-refs file all the time.
-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Liana Sebastian | 
| On the Git mailing list, Sean Allred wrote (reply to this): Han-Wen Nienhuys <[email protected]> writes:
> To be honest, I don't understand why symrefs are such a generic
> concept; I've only ever seen them used for HEAD.
I've been only lurking in this thread (and loosely following along,
even!) but I do want to call out that I have recently considered perhaps
abusing symrefs to point to normal feature branches. In our workflow, we
have documentation records identified by a numeric ID -- the code
changes corresponding to that documentation (testing instructions, etc.)
use formulaic branch names like `feature/123456`.
It is sometimes beneficial for two or more of these documentation
records to perform their work on the same code branch. There are myriad
reasons for this, some better than others, but I want to avoid getting
mired in whether or not this is a good idea. It does happen and is
sometimes even the best way to do it.
In these scenarios, I've considered having `feature/2` be a symref to
`feature/1` so that both features can always 'know' what to call their
branch for operations like checkout. I've done this on a smaller scale
in the past to great effect.
Nothing is set in stone here for us, but I did want to call this out as
a potential real-world use case.
--
Sean Allred | 
| User  | 
| On the Git mailing list, Junio C Hamano wrote (reply to this): Derrick Stolee <[email protected]> writes:
> I do want to say that while I admire
> JGit's dedication to being compatible with repositories created by Git, I
> don't think the reverse is a goal of the Git project.
The world works better if cross-pollination happens both ways,
though.
>> * Symrefs are refs too, but for some reason the packed-refs file
>> doesn't support them. Does packed-refs v2 support symrefs too?  If you
>> want to snapshot the state of refs, do you want to snapshot the value
>> of HEAD too?
>
> I forgot that loose refs under .git/refs/ can be symrefs. This definitely
> is a limitation that I should mention. Again, pseudorefs like HEAD are not
> included and are stored separately, but symrefs within refs/* are not
> available in packed-refs (v1 or v2). That should be explicitly called out
> in the extensions.refFormat docs.
I expect that, in a typical individual-contributor repository, there
are at least two symbolic refs, e.g.
    .git/HEAD
    .git/refs/remotes/origin/HEAD
Having to fall back on the loose ref hierarchy only to be able to
store the latter is a bit of shame---as long as you are revamping
the format, the design should allow us to eventually migrate all
refs to the new format without having to do the "check there, and if
there isn't then check this other place", which is what the current
loose + packed combination do, I would think. | 
| This patch series was integrated into seen via git@380fa25. | 
| This patch series was integrated into seen via git@119dc17. | 
| This patch series was integrated into seen via git@29ec2c5. | 
| This patch series was integrated into seen via git@f35f1d1. | 
| There was a status update in the "Cooking" section about the branch  Waiting for review. source: <[email protected]> | 
| On the Git mailing list, Derrick Stolee wrote (reply to this): On 11/30/2022 1:30 PM, Han-Wen Nienhuys wrote:
> On Wed, Nov 30, 2022 at 4:16 PM Derrick Stolee <[email protected]> wrote:
>> (Note: there is a strategy that doesn't need this approach, but it's a bit
>> complicated. It would involve rotating all replicas to new repositories
>> that are configured to use reftable upon creation, getting the refs from
>> other replicas via fetches. In my opinion, this is prohibitively
>> expensive.)
> 
> I'm not sure I understand the problem. Any deletion of a ref (that is
> in packed-refs) today already requires rewriting the entire
> packed-refs file ("all or nothing" operation). Whether you write a
> packed-refs or reftable is roughly equally expensive.
> 
> Are you looking for a way to upgrade a repo, while concurrent git
> process may write updates into the repository during the update? That
> may be hard to pull off, because you probably need to rename more than
> one file atomically. If you accept momentarily failed writes, you
> could do
> 
> * rename refs/ to refs.old/ (loose ref writes will fail now)
> * collect loose refs under refs.old/ , put into packed-refs
> * populate the reftable/ dir
> * set refFormat extension.
> * rename refs.old/ to refs/ with a refs/heads a file (as described in
> the reftable spec.)
>
> See also https://gerrit.googlesource.com/jgit/+/ca166a0c62af2ea87fdedf2728ac19cb59a12601/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/FileRepository.java#734
Yes, I would ideally like for the repository to "upgrade" its ref
storage mechanism during routine maintenance in a non-blocking way
while other writes and reads continue as normal.
After discussing it a bit internally, we _could_ avoid the "rotate
the replicas" solution if there was a "git upgrade-ref-format"
command that could switch from one to another, but it would still
involve pulling that replica out of the rotation and then having
it catch up to the other replicas after that is complete. If I'm
reading your draft correctly, that is not currently available in
your work, but we could add it after the fact.
Requiring pulling replicas out of rotation is still a bit heavy-
handed for my liking, but it's much less expensive than moving
all of the Git data.
>> The reason to start with this step is that the benefits and risks are
>> clearly understood, which can motivate us to establish the mechanism for
>> changing the ref format by defining the extension.
> 
> I believe that the v2 format is a safe change with performance
> improvements, but it's a backward incompatible format change with only
> modest payoff. I also don't understand how it will help you do a stack
> of tables,
> which you need for your primary goal (ie. transactions/deletions
> writing only the delta, rather than rewriting the whole file?).
The v2 format doesn't help me on its own, but it has other benefits
in terms of size and speed, as well as the "ref count" functionality.
The important thing is that the definition of extensions.refFormat
that I'm proposing in this RFC establishes a way to make incremental
progress on the ref format, allowing the stacked format to come in
later with less friction.
 
>> * The reftable is currently fundamentally different enough that it could
>>   not be used as a replacement for the packed-refs file underneath loose
>>   refs (primarily due to its integration with the reflog). Doing so would
>>   require significant work on top of your prototype.
> 
> It could, but I don't see the point.
My point is that we can upgrade repositories by replacing packed-refs
with reftable during routine maintenance instead of the heavier
approaches discussed earlier.
* Step 1: replace packed-refs with reftable.
* Step 2: stop writing loose refs, only update reftable (but still read loose refs).
* Step 3: collapse all loose refs into reftable, stop reading or writing loose refs.
 
>> I'm going to take the following actions on my end to better understand the
>> situation:
>>
>> 1. I'll take your draft PR branch and do some performance evaluations on
>>    the speed of ref updates compared to loose refs and my prototype of a
>>    two-stack packed-ref where the second layer of the stack is only for
>>    deleted refs.
> 
> (tangent) - wouldn't that design perform poorly once the number of
> deletions gets large? You'd basically have to rewrite the
> deleted-packed-refs file all the time.
 
We have regular maintenance that is triggered by pushes that rewrites
the packed-refs file frequently, anyway. The maintenance currently is
blocked on the amount of time spent repacking object data, so a large
number of ref updates can come in during this process. (That maintenance
step would collapse the deleted-refs layer into the base layer.)
I've tested a simple version of this stack that shows that rewriting the
file with 1,000 deletions is still within 2x the cost of updating a loose
ref, so it solves the immediate problem using a much simpler stack model,
at least in the most-common case where ref deletions are less frequent
than other updates. Even if the size outgrew the 2x cost limit, the
deleted file is still going to be much smaller than the base packed-refs
file, which is currently rewritten for every deletion, so it is still an
improvement.
The more complicated stack model would be required to funnel all ref
updates into that structure and away from loose refs.
Thanks,
-Stolee | 
| This patch series was integrated into seen via git@c18b6a0. | 
| This patch series was integrated into seen via git@a4cb0c9. | 
| On the Git mailing list, Han-Wen Nienhuys wrote (reply to this): On Thu, Dec 1, 2022 at 9:19 PM Derrick Stolee <[email protected]> wrote:
> >> The reason to start with this step is that the benefits and risks are
> >> clearly understood, which can motivate us to establish the mechanism for
> >> changing the ref format by defining the extension.
> >
> > I believe that the v2 format is a safe change with performance
> > improvements, but it's a backward incompatible format change with only
> > modest payoff. I also don't understand how it will help you do a stack
> > of tables,
> > which you need for your primary goal (ie. transactions/deletions
> > writing only the delta, rather than rewriting the whole file?).
>
> The v2 format doesn't help me on its own, but it has other benefits
> in terms of size and speed, as well as the "ref count" functionality.
>
> The important thing is that the definition of extensions.refFormat
> that I'm proposing in this RFC establishes a way to make incremental
> progress on the ref format, allowing the stacked format to come in
> later with less friction.
I guess you want to move the read/write stack under the loose storage
(packed backend), and introduce (read loose/packed + write packed
only) mode that is transitional?
Before you embark on this incremental route, I think it would be best
to think through carefully how an online upgrade would work in detail
(I think it's currently not specified?) If ultimately it's not
feasible to do incrementally, then the added complexity of the
incremental approach will be for naught.
The incremental mode would only be of interest to hosting providers.
It will only be used transitionally. It is inherently going to be
complex, because it has to consider both storage modes at the same
time, and because it is transitional, it will get less real life
testing. At the same time, the ref database is comparatively small, so
the availability blip that converting the storage offline will impair
is going to be small. So, the incremental approach is rather expensive
for a comparatively small benefit.
I also thought a bit about how you could make the transition seamless,
but I can't see a good way: you have to coordinate between tables.list
(the list of reftables active or whatever file signals the presence of
a stack) and files under refs/heads/. I don't know how to do
transactions across multiple files without cooperative locking.
If you assume you can use filesystem locks, then you could do
something simpler: if a git repository is marked 'transitional', git
processes take an FS read lock on .git/ .  The process that converts
the storage can take an exclusive (write) lock on .git/, so it knows
nobody will interfere. I think this only works if the repo is on local
disk rather than NFS, though.
> * Step 1: replace packed-refs with reftable.
> * Step 2: stop writing loose refs, only update reftable (but still read loose refs).
Does that work? A long running process might not notice the switch in
step 2, so it could still write a ref as loose, while another process
racing might write a different value to the same ref through reftable.
PS. I'll be away from work until Jan 9th.
-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Liana Sebastian | 
| On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this): On Fri, Dec 02 2022, Han-Wen Nienhuys wrote:
> On Thu, Dec 1, 2022 at 9:19 PM Derrick Stolee <[email protected]> wrote:
>> >> The reason to start with this step is that the benefits and risks are
>> >> clearly understood, which can motivate us to establish the mechanism for
>> >> changing the ref format by defining the extension.
>> >
>> > I believe that the v2 format is a safe change with performance
>> > improvements, but it's a backward incompatible format change with only
>> > modest payoff. I also don't understand how it will help you do a stack
>> > of tables,
>> > which you need for your primary goal (ie. transactions/deletions
>> > writing only the delta, rather than rewriting the whole file?).
>>
>> The v2 format doesn't help me on its own, but it has other benefits
>> in terms of size and speed, as well as the "ref count" functionality.
>>
>> The important thing is that the definition of extensions.refFormat
>> that I'm proposing in this RFC establishes a way to make incremental
>> progress on the ref format, allowing the stacked format to come in
>> later with less friction.
>
> I guess you want to move the read/write stack under the loose storage
> (packed backend), and introduce (read loose/packed + write packed
> only) mode that is transitional?
>
> Before you embark on this incremental route, I think it would be best
> to think through carefully how an online upgrade would work in detail
> (I think it's currently not specified?) If ultimately it's not
> feasible to do incrementally, then the added complexity of the
> incremental approach will be for naught.
>
> The incremental mode would only be of interest to hosting providers.
> It will only be used transitionally. It is inherently going to be
> complex, because it has to consider both storage modes at the same
> time, and because it is transitional, it will get less real life
> testing. At the same time, the ref database is comparatively small, so
> the availability blip that converting the storage offline will impair
> is going to be small. So, the incremental approach is rather expensive
> for a comparatively small benefit.
>
> I also thought a bit about how you could make the transition seamless,
> but I can't see a good way: you have to coordinate between tables.list
> (the list of reftables active or whatever file signals the presence of
> a stack) and files under refs/heads/. I don't know how to do
> transactions across multiple files without cooperative locking.
A multi-backend transaction would be hard to do at the best of times,
but we'd also presumably run into the issue that not all ref operations
currently use the transaction mechanism (e.g. branch copying/moving). So
if one or the other fails there all bets are off as far as getting back
to a consistent state.
Perhaps a more doable & interesting approach would be to have a "slave"
backend that would follow along, i.e. we'd replay all operations from
"master" to "slave" (as with DB replication, just within a single
repository).
We might get out of sync, but as the "master" is always the source of
truth presumably we could run some one-off re-exporting of the refspace
get back up-to-date, and hopefully not get out of sync again.
Then once we're ready, we could flip the switch indicating what becomes
the canonical backend.
For reftable the FS layout under .git/* is incompatible, so we'd also
need to support writing to some alternate directory to make such a thing
work... | 
| User  | 
| This patch series was integrated into seen via git@8bfc518. | 
| This patch series was integrated into seen via git@3b269f1. | 
| There was a status update in the "Cooking" section about the branch  Waiting for review. Seems to break CI. linux-TEST-vars cf. https://github.com/git/git/actions/runs/3598165328 source: <[email protected]> | 
| This patch series was integrated into seen via git@5ed86a4. | 
| This patch series was integrated into seen via git@31ce5f4. | 
| This patch series was integrated into seen via git@58abe16. | 
| This patch series was integrated into seen via git@5306ab7. | 
| This patch series was integrated into seen via git@4974a40. | 
| This patch series was integrated into seen via git@54681ef. | 
| There was a status update in the "Cooking" section about the branch  Waiting for review. Seems to break CI. linux-TEST-vars cf. https://github.com/git/git/actions/runs/3598165328 source: <[email protected]> | 
| There was a status update in the "Cooking" section about the branch  Waiting for review. Seems to break CI. linux-TEST-vars cf. https://github.com/git/git/actions/runs/3598165328 source: <[email protected]> | 
Introduction
I became interested in our packed-ref format based on the asymmetry between ref updates and ref deletions: if we delete a packed ref, then the packed-refs file needs to be rewritten. Compared to writing a loose ref, this is an O(N) cost instead of O(1).
In this way, I set out with some goals:
I also had one major non-goal to keep things focused:
After carefully considering several options, it seemed that there are two solutions that can solve this effectively:
The reftable work seems currently dormant. The format is pretty complicated and I have a difficult time seeing a way forward for it to be fully integrated into Git. Personally, I'd prefer a more incremental approach with formats that are built for a basic filesystem. During the process, we can create APIs within Git that can benefit other file formats within Git.
Further, there is a simpler model that satisfies my primary goal without the complication required for the secondary goal. Suppose we create a stacked packed-refs file but only have two layers: the first (base) layer is created when
git pack-refscollapses the full stack and adds the loose ref updates to the packed-refs file; the second (top) layer contains only ref deletions (allowing null OIDs to indicate a deleted ref). Then, ref deletions would only need to rewrite that top layer, making ref deletions take O(deletions) time instead of O(all refs) time. With a reasonable schedule to squash the packed-refs stack, this would be a dramatic improvement. (A prototype implementation showed that updating a layer of 1,000 deletions takes only twice the time as writing a single loose ref.)If we want to satisfy the secondary goal of passing all ref updates through the packed storage, then more complicated layering would be necessary. The point of bringing this up is that we have incremental goals along the way to that final state that give us good stopping points to test the benefits of each step.
Stacking the packed-refs format introduces several interesting strategy points that are complicated to resolve. Before we can do that, we first need to establish a way to modify the ref format of a Git repository. Hence, we need a new extension for the ref formats.
To simplify the first update to the ref formats, it seemed better to add a new file format version to the existing packed-refs file format. This format has the exact lock/write/rename mechanics of the current packed-refs format, but uses a file format that structures the information in a more compact way. It uses the chunk-format API, with some tweaks. This format update is useful to the final goal of a stacked packed-refs API, since each layer will have faster reads and writes. The main reason to do this first is that it is much simpler to understand the value-add (smaller files means faster performance).
RFC Organization
This RFC is quite long, but the length seemed necessary to actually provide and end-to-end implementation that demonstrates the packed-refs v2 format along with test coverage (via the new GIT_TEST_PACKED_REFS_VERSION variable).
For convenience, I've broken each section of the full RFC into parts, which resembles how I intend to submit the pieces for full review. These parts are available as pull requests in my fork, but here is a breakdown:
Part I: Optionally hash the index
[1] derrickstolee#23
Packed-refs v2 Part I: Optionally hash the index
(Patches 1-2)
The chunk-format API uses the hashfile API as a buffered write, but also all existing formats that use the chunk-format API also have a trailing hash as part of the format. Since the packed-refs file has a critical path involving its write speed (deleting a packed ref), it seemed important to allow apples-to-apples comparison between the v1 and v2 format by skipping the hashing. This is later toggled by a config option.
In this part, the focus is on allowing the hashfile API to ignore updating the hash during the buffered writes. We've been using this in microsoft/git to optionally speed up index writes, which patch 2 introduces here. The file format instead writes a null OID which would look like a corrupt file to an older 'git fsck'. Before submitting a full version, I would update 'git fsck' to ignore a null OID in all of our file formats that include a trailing hash. Since the index is more short-lived than other formats (such as pack-files) this trailing hash is less useful. The write time is also critical as the performance tests demonstrate.
Part II: Create
extensions.refFormat[2] derrickstolee#24
Packed-refs v2 Part II: create extensions.refFormat
(Patches 3-7)
This part is a critical concept that has yet to be defined in the Git codebase. We have no way to incrementally modify the ref format. Since refs are so critical, we cannot add an optionally-understood layer on top (like we did with the multi-pack-index and commit-graph files). The reftable draft [6] proposes the same extension name (
extensions.refFormat) but focuses instead on only a single value. This means that the reftable must be defined atgit initorgit clonetime and cannot be upgraded from the files backend.In this RFC, I propose a different model that allows for more customization and incremental updates. The
extensions.refFormatconfig key is multi-valued and defaults to the list offilesandpacked. In the context of this RFC, the intention is to be able to addpacked-v2so the list of all three values would allow Git to write and read either file format version (v1 or v2). In the larger scheme, the extension could allow restricting to only loose refs (justfiles) or only packed-refs (justpacked) or even later when reftable is complete,filesandreftablecould mean that loose refs are the primary ref storage, but the reftable format serves as a drop-in replacement for thepacked-refsfile. Not all combinations need to be understood by Git, but having them available as an option could be useful for flexibility, especially when trying to upgrade existing repositories to new formats.In the future, beyond the scope of this RFC, it would be good to add a
stackedvalue that allows a stack of files in packed-refs format (whose version is specified by thepackedorpacked-v2values) so we can further speed up writes to the packed layer. Depending on how well that works, we could focus on speeding up ref deletions or sending all ref writes straight to the packed-refs layer. With the option to keep the loose refs storage, we have flexibility to explore that space incrementally when we have time to get to it.Part III: Allow a trailing table-of-contents in the chunk-format API
[3] derrickstolee#25
Packed-refs v2 Part III: trailing table of contents in chunk-format
(Patches 8-17)
In order to optimize the write speed of the packed-refs v2 file format, we want to write immediately to the file as we stream existing refs from the current refs. The current chunk-format API requires computing the chunk lengths in advance, which can slow down the write and take more memory than necessary. Using a trailing table of contents solves this problem, and was recommended earlier [7]. We just didn't have enough evidence to justify the work to update the existing chunk formats. Here, we update the API in advance of using in the packed-refs v2 format.
We could consider updating the commit-graph and multi-pack-index formats to use trailing table of contents, but it requires a version bump. That might be worth it in the case of the commit-graph where computing the size of the changed-path Bloom filters chunk requires a lot of memory at the moment. After this chunk-format API update is reviewed and merged, we can pursue those directions more closely. We would want to investigate the formats more carefully to see if we want to update the chunks themselves as well as some header information.
Part IV: Abstract some parts of the v1 file format
[4] derrickstolee#26
Packed-refs v2 Part IV: abstract some parts of the v1 file format
(Patches 18-21)
These patches move the part of the refs/packed-backend.c file that deal with the specifics of the packed-refs v1 file format into a new file: refs/packed-format-v1.c. This also creates an abstraction layer that will allow inserting the v2 format more easily.
One thing that doesn't exist currently is a documentation file describing the packed-refs file format. I would add that file in this part before submitting it for full review. (I also haven't written the file format doc for the packed-refs v2 format, either.)
Part V: Implement the v2 file format
[5] derrickstolee#27
Packed-refs v2 Part V: the v2 file format
(Patches 22-35)
This is the real meat of the work. Perhaps there are ways to split it further, but for now this is what I have ready. The very last patch does a complete performance comparison for a repo with many refs.
The format is not yet documented, but is broken up into these pieces:
The refs data chunk stores the same data as the packed-refs file, but each ref is broken down as follows: the ref name (with trailing zero), the OID for the ref in its raw bytes, and (if necessary) the peeled OID for the ref in its raw bytes. The refs are sorted lexicographically.
The ref offsets chunk is a single column of 64-bit offsets into the refs chunk indicating where each ref starts. The most-significant bit of that value indicates whether or not there is a peeled OID.
The prefix data chunk lists a set of ref prefixes (currently writes only allow depth-2 prefixes, such as
refs/heads/andrefs/tags/). When present, these prefixes are written in this chunk and not in the refs data chunk. The prefixes are sorted lexicographically.The prefix offset chunk has two 32-bit integer columns. The first column stores the offset within the prefix data chunk to the start of the prefix string. The second column points to the row position for the first ref that has name greater than this prefix (the 0th prefix is assumed to start at row 0, so we can interpret the prefix range from
row[i-1]androw[i]).Between using raw OIDs and storing the depth-2 prefixes only once, this format compresses the file to ~60% of its v1 size. (The format allows not writing the prefix chunks, and the prefix chunks are implemented after the basics of the ref chunks are complete.)
The write times are reduced in a similar fraction to the size difference. Reads are sped up somewhat, and we have the potential to do a ref count by prefix much faster by doing a binary search for the start and end of the prefix and then subtracting the row positions instead of scanning the file between to count refs.
Relationship to Reftable
I mentioned earlier that I had considered using reftable as a way to achieve the stated goals. With the current state of that work, I'm not confident that it is the right approach here.
My main worry is that the reftable is more complicated than we need for a typical Git repository that is based on a typical filesystem. This makes testing the format very critical, and we seem to not be near reaching that approach. The v2 format here is very similar to existing Git file formats since it uses the chunk-format API. This means that the amount of code custom to just the v2 format is quite small.
As mentioned, the current extension plan [6] only allows
reftableorfilesand does not allow for a mix of both. This RFC introduces the possibility that both could co-exist. Using that multi-valued approach means that I'm able to test the v2 packed-refs file format almost as well as the v1 file format within this RFC. (More tests need to be added that are specific to this format, but I'm waiting for confirmation that this is an acceptable direction.) At the very least, this multi-valued approach could be used as a way to allow using the reftable format as a drop-in replacement for the packed-refs file, as well as upgrading an existing repo to use reftable. That might even help the integration process to allow the reftable format to be tested at least by some subset of tests instead of waiting for a full test suite update.I'm interested to hear from people more involved in the reftable work to see the status of that project and how it matches or differs from my perspective.
The one thing I can say is that if the reftable work had not already begun, then this is RFC is how I would have approached a new ref format.
I look forward to your feedback!
Thanks,
[6] https://github.com/git/git/pull/1215/files#diff-a30f88b458b1f01e7a67e72576584b5b77ddb0362e40da6f7bf4a9ddf79db7b8R41-R48
The draft version of
extensions.refFormatfor reftable.[7] https://lore.kernel.org/git/[email protected]/
Re: [PATCH 00/15] Refactor chunk-format into an API
(where René recommends a trailing table of contents)
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: Elijah Newren [email protected]
cc: Phillip Wood [email protected]
cc: Sean Allred [email protected]
cc: Ævar Arnfjörð Bjarmason [email protected]