-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Prepare for v2.51.0(2) #5840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare for v2.51.0(2) #5840
Conversation
With 974cdca (doc: introduce a synopsis typesetting, 2024-09-24) we have introduced a new synopsis type that simplifies the rules for typesetting a command's synopsis. Convert the git-reflog(1) documentation to use it. While at it, convert the list of options to use backticks. This is done to appease an upcoming new linter that mandates the use of backticks when using the synopsis type. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The way subcommands of git-reflog(1) are laid out does not make any immediate sense. Reorder them such that read-only subcommands precede writing commands for a bit more structure. Furthermore, move the "expire" subcommand last. This prepares for a subsequent change where we are about to introduce a new "write" command to append reflog entries. Like this, the writing subcommands are ordered such that those affecting a single reflog come before those spanning across all reflogs. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
In a subsequent commit we'll add another user that wants to write reflog entries. This requires them to call `ref_transaction_update_reflog()`, but that function is local to "refs.c". Export the function to prepare for the change. While at it, drop the `flags` field, as all callers are for now expected to use the same flags anyway. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
While we provide a couple of subcommands in git-reflog(1) to remove reflog entries, we don't provide any to write new entries. Obviously this is not an operation that really would be needed for many use cases out there, or otherwise people would have complained that such a command does not exist yet. But the introduction of the "reftable" backend changes the picture a bit, as it is now basically impossible to manually append a reflog entry if one wanted to do so due to the binary format. Plug this gap by introducing a simple "write" subcommand. For now, all this command does is to append a single new reflog entry with the given object IDs and message to the reflog. More specifically, it is not yet possible to: - Write multiple reflog entries at once. - Insert reflog entries at arbitrary indices. - Specify the date of the reflog entry. - Insert reflog entries that refer to nonexistent objects. If required, those features can be added at a future point in time. For now though, the new command aims to fulfill the most basic use cases while being as strict as possible when it comes to verifying parameters. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The last parameter in `split_ident_line()` is the length of the line passed in by the caller. As such, most callers pass in either the result of `strlen()`, `struct strbuf::len` or a pointer diff, all of which are expected to be positive numbers. Regardless of that, the function accepts a signed integer, which is somewhat confusing. Fix the function signature to instead accept a `size_t`. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When migrating reflog entries between different storage formats we must reconstruct the identity of reflog entries. This is done by passing the committer passed to the `migrate_one_reflog_entry()` callback function to `fmt_ident()`. This results in an invalid identity though: `fmt_ident()` expects the caller to provide both name and mail of the author, but we pass the full identity as mail. This leads to an identity like: pks <Patrick Steinhardt [email protected]> Fix the bug by splitting the identity line first. This allows us to extract both the name and mail so that we can pass them to `fmt_ident()` separately. This commit does not yet add any tests as there is another bug in the reflog migration that will be fixed in a subsequent commit. Once that bug is fixed we'll make the reflog verification in t1450 stricter, and that will catch both this bug here and the other bug. Note that we also add two new `name` and `mail` string buffers to the callback structures and splice them through to the callbacks. This is done so that we can avoid allocating a new buffer every time we compute the committer information. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When updating a reference that is being pointed to HEAD we don't only write a reflog message for that particular reference, but also generate one for HEAD. This logic is handled by `split_head_update()`, where we: 1. Verify that the condition actually triggered. This is done by reading HEAD at the start of the transaction so that we can then check whether a given reference update refers to its target. 2. Queue a new log-only update for HEAD in case it did. But the logic is unfortunately not free of races, as we do not lock the HEAD reference after we have read its target. This can lead to the following two scenarios: - HEAD gets concurrently updated to point to one of the references we have already processed. This causes us not writing a reflog message even though we should have done so. - HEAD gets concurrently updated to no longer point to a reference anymore that we have already processed. This causes us to write a reflog message even though we should _not_ have done so. Improve the situation by introducing a new `REF_LOG_VIA_SPLIT` flag that is specific to the "files" backend. If set, we will double check that the HEAD reference still points to the reference that we are creating the reflog entry for after we have locked HEAD. Furthermore, instead of manually resolving the old object ID of that entry, we now use the same old state as for the parent update. If we detect such a racy update we abort the transaction. This is a bit heavy-handed: the user didn't even ask us to write a reflog update for "HEAD", so it might be surprising if we abort the transaction. That being said: - Normal users wouldn't typically hit this case as we only hit the relevant code when committing to a branch that is being pointed to by "HEAD" directly. Commands like git-commit(1) typically commit to "HEAD" itself though. - Scripted users that use git-update-ref(1) and related plumbing commands are unlikely to hit this case either, as they would have to update the pointed-to-branch at the same as "HEAD" is being updated, which is an exceedingly rare event. The alternative would be to instead drop the log-only update completely, but that would require more logic that is hard to verify without adding infrastructure specific for such a test. So we rather do the pragmatic thing and don't worry too much about an edge case that is very unlikely to happen. Unfortunately, this change only helps with the second race. We cannot reliably plug the first race without locking the HEAD reference at the start of the transaction. Locking HEAD unconditionally would effectively serialize all writes though, and that doesn't seem like an option. Also, double checking its value at the end of the transaction is not an option either, as its target may have flip-flopped during the transaction. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The `REF_HAVE_OLD` flag indicates whether a given ref update has its old object ID set. If so, the value of that field is used to verify whether the current state of the reference matches this expected state. It is thus an important part of mitigating races with a concurrent process that updates the same set of references. When writing reflogs though we explicitly unset that flag. This is a sensible thing to do: the old state of reflog entry updates may not necessarily match the current on-disk state of its accompanying ref, but it's only intended to signal what old object ID we want to write into the new reflog entry. For example when migrating refs we end up writing many reflog entries for a single reference, and most likely those reflog entries will have many different old object IDs. But unsetting this flag also removes a useful signal, namely that the caller _did_ provide an old object ID for a given reflog entry. This signal will become useful in a subsequent commit, where we add a new flag that tells the transaction to use the provided old and new object IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a signal to verify that the caller really did provide an old object ID. Stop unsetting the flag so that we can use it as this described signal in a subsequent commit. Skip checking the old object ID for log-only updates so that we don't expect it to match the current on-disk state. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When migrating reflog entries between different storage formats we end up with invalid old object IDs for the migrated entries: instead of writing the old object ID of the to-be-migrated entry, we end up with the all-zeroes object ID. The root cause of this issue is that we don't know to use the old object ID provided by the caller. Instead, we manually resolve the old object ID by resolving the current value of its matching reference. But as that reference does not yet exist in the target ref storage we always end up resolving it to all-zeroes. This issue got unnoticed as there is no user-facing command that would even show the old object ID. While `git log -g` knows to show the new object ID, we don't have any formatting directive to show the old object ID. Fix the bug by introducing a new flag `REF_LOG_USE_PROVIDED_OIDS`. If set, backends are instructed to use the old and new object IDs provided by the caller, without doing any manual resolving. Set this flag in `ref_transaction_update_reflog()`. Amend our tests in t1460-refs-migrate to use our test tool to read reflog entries. This test tool prints out both old and new object ID of each reflog entry, which fixes the test gap. Furthermore it also prints the full identity used to write the reflog, which provides test coverage for the previous commit in this patch series that fixed the identity for migrated reflogs. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
4c063c8 (rebase -i: improve error message when picking merge, 2024-05-30) added advice texts for cases when a merge commit is passed as argument of sequencer command that cannot operate with a merge commit. However, it forgot about the 'drop' command, so that in this case the BUG() in the default branch is reached. Handle 'drop' like 'merge', i.e., permit it without a message. Signed-off-by: Johannes Sixt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
In commit 919df31 (Collect merge-related tests to t64xx, 2020-08-10), merge related tests were moved from t60xx to t64xx. Some comments in merge-ort relating to some tricky code referenced specific testcases within certain testfiles for additional information, but referred to their historical testfile names; update the testfile names to mention their modern location. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
check_for_directory_rename() had a weirdly coded check for whether a strmap contained a certain key. Replace the temporary variable and call to strmap_get_entry() with the more natural strmap_contains() call. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When commit 98a1a00 (t6423: add a testcase causing a failed assertion in process_renames, 2025-03-06) was added, I tweaked the commit message, and moved the test into t6423. However, that still left two other things missing that made this test unlike the others in the same testfile: * It didn't have an English description of the test setup like all other tests in t6423 * It didn't check that the right number of files were present at the end The former issue is a minor detail that isn't that critical, but the latter feels more important. If it had been done, I might have noticed another bug. In particular, this testcase involves Side A: rename world -> tools/world and Side B: rename tools/ -> <the toplevel> Side B: remove world The tools/ -> <toplevel> rename turns the world -> tools/world rename into world -> world, i.e. a rename-to-self case. But, it's a path conflict because merge.directoryRenames defaults to false. There's no content conflict because Side A didn't modify world, so we should just take the content of world from Side B -- i.e. delete it. So, we have a conflict on the path, but not on its content. We could consider letting the content trump since it is unconflicted, but if we are going to leave a conflict, it should certainly represent that 'world' existed both in the base version and on Side A. Currently it doesn't. Add a description of this test, add some checking of the number of entries in the index at the end of the merge, and mark the test as expecting to fail for now. A subsequent commit will fix this bug. While at it, I found another related bug from a nearly identical setup but setting merge.directoryRenames=true. Copy testcase 12n into 12n2, changing it to use merge instead of cherry-pick, and turn on directory renames for this test. In this case, since there is no content conflict and no path conflict, it should be okay to delete the file. Unfortunately, the code resolves without conflict but silently leaves world despite the fact it should be deleted. It might also be okay if the code spuriously thought there was a modify/delete conflict here; that would at least notify users to look closer and then when they notice there was no change since the base version, they can easily resolve. A conflict notice is much better than silently providing the wrong resolution. Cover this with the 12n2 testcase, which for now is marked as expecting to fail as well. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Commit 806f832 (t6423: test directory renames causing rename-to-self, 2021-06-30) introduced testcase 12i-12k but omitted staging one of the files and copy-pasted that mistake to the other tests. This means the merge runs with an unstaged change, even though that isn't related to what is being tested and makes the test look more complicated than it is. The cover letter for the series associated with the above commit (see Message-ID: [email protected]) noted that these testcases triggered two bugs in merge-recursive but only one in merge-ort; in merge-recursive these testcases also triggered a silent deletion of the file in question when it shouldn't be deleted. What I didn't realize at the time was that the deletion bug in merge-ort was merely being sidestepped by the "relevant renames" optimization but can actually be triggered. A subsequent commit will deal with that additional bug, but it was complicated by the mistaken forgotten staging, so this commit first fixes that issue. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Because merge-ort is dealing with potentially all the pathnames in the repository, it sometimes needs to do an awful lot of string comparisons. Because of this, struct merge_options_internal's path member was envisioned from the beginning to contain an interned value for every path in order to allow us to compare strings via pointer comparison instead of using strcmp. See * 5b59c3d (merge-ort: setup basic internal data structures, 2020-12-13) * f591c47 (merge-ort: copy and adapt merge_3way() from merge-recursive.c, 2021-01-01) for some of the early comments. However, the original comment was slightly misleading when it switched from mentioning paths to only mentioning directories. Fix that, and while at it also point to an example in the code which applies the extra needed care to permit the pointer comparison optimization. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
We have multiple bugs here -- accidental silent file deletion, accidental silent file retention for files that should be deleted, and incorrect number of entries left in the index. The series merged at commit d3b88be (Merge branch 'en/merge-dir-rename-corner-case-fix', 2021-07-16) introduced testcase 12i-12k in t6423 which checked for rename-to-self cases, and fixed bugs that merge-ort and merge-recursive had with these testcases. At the time, I noted that merge-ort had one bug for these cases, while merge-recursive had two. It turns out that merge-ort did in fact have another bug, but the "relevant renames" optimizations were masking it. If we modify testcase 12i from t6423 to modify the file in the commit that renames it (but only modify it enough that it can still be detected as a rename), then we can trigger silent deletion of the file. Tweak testcase 12i slightly to make the file in question have more than one line in it. This leaves the testcase intact other than changing the initial contents of this one file. The purpose of this tweak is to minimize the changes between this testcase and a new one that we want to add. Then duplicate testcase 12i as 12i2, changing it so that it adds a single line to the file in question when it is renamed; testcase 12i2 then serves as a testcase for this merge-ort bug that I previously overlooked. Further, commit 98a1a00 (t6423: add a testcase causing a failed assertion in process_renames, 2025-03-06), fixed an issue with rename-to-self but added a new testcase, 12n, that only checked for whether the merge ran to completion. A few commits ago, we modified this test to check for the number of entries in the index -- but noted that the number was wrong. And we also noted a silently-keep-instead-of-delete bug at the same time in the new testcase 12n2. In summary, we have the following bugs with rename-to-self cases: * silent deletion of file expected to be kept (t6423 testcase 12i2) * silent retention of file expected to be removed (t6423 testcase 12n2) * wrong number of extries left in the index (t6423 testcase 12n) All of these bugs arise because in a rename-to-self case, when we have a rename A->B, both A and B name the same file. The code in process_renames() assumes A & B are different, and tries to move the higher order stages and file contents so that they are associated just with the new path, but the assumptions of A & B being different can cause A to be deleted when it's not supposed to be or mark B as resolved and kept in place when it's supposed to be deleted. Since A & B are already the same path in the rename-to-self case, simply skip the steps in process_renames() for such files to fix these bugs. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
At GitHub, we've got a real-world repository that has been triggering failures of the form: git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed. which comes from the line: VERIFY_CI(newinfo); Unfortunately, this one has been quite complex to unravel, and is a bit complex to explain. So, I'm going to carefully try to explain each relevant piece needed to understand the fix, then carefully build up from a simple testcase to some of the relevant testcases. == New special case we need to consider == Rename pairs in the diffcore machinery connect the source path of a rename with the destination path of a rename. Since we have rename pairs to consider on both sides of history since the merge base, merging has to consider a few special cases of possible overlap: A) two rename pairs having the same target path B) two rename pairs having the same source path C) the source path of one rename pair being the target path of a different rename pair Some of these came up often enough that we gave them names: A) a rename/rename(2to1) conflict (looks similar to an add/add conflict) B) a rename/rename(1to2) conflict, which represents the same path being renamed differently on the two sides of history C) not yet named merge-ort is well-prepared to handle cases (A) and (B), as was merge-recursive (which was merge-ort's predecessor). Case (C) was briefly considered during the years of merge-recursive maintenance, but the full extent of support it got was a few FIXME/TODO comments littered around the code highlighting some of the places that would probably need to be fixed to support it. When I wrote merge-ort I ignored case (C) entirely, since I believed that case (C) was only possible if we were to support break detection during merges. Not only had break detection never been supported by any merge algorithm, I thought break detection wasn't worth the effort to support in a merge algorithm. However, it turns out that case (C) can be triggered without break detection, if there's enough moving pieces. Before I dive into how to trigger case (C) with directory renames plus other renames, it might be helpful to use a simpler example with break detection first. And before we get to that it may help to explain some more basics of handling renames in the merge algorithm. So, let me first backup and provide a quick refresher on each of * handling renames * what break detection would mean, if supported in merging * handling directory renames From there, I'll build up from a basic directory rename detection case to one that triggers a failure currently. == Handling renames == In the merge machinery when we have a rename of a path A -> B, processing that rename needs to remove path A, and make sure that path B has the relevant information. Note that if the content was also modified on both sides, this may mean that we have 3 different stages that need to be stored at path B instead of having some stored at path A. Having all stages stored at path B makes it much easier for users to investigate and resolve the content conflict associated with a renamed path. For example: * "git status" doesn't have to figure out how to list paths A & B and attempt to connect them for users; it can just list path B. * Users can use "git ls-files -u B" (instead of trying to find the previous name of the file so they can list both, i.e. "git ls-files -u A B") * Users can resolve via "git add B" (without needing to "git rm A") == What break detection would mean == If break detection were supported, we might have cases where A -> B *and* C -> A, meaning that both rename pairs might believe they need to update A. In particular, the processing of A -> B would need to be careful to not clear out all stages of A and mark it resolved, while both renames would need to figure out which stages of A belong with A and which belong with B, so that both paths have the right stages associated with them. merge-ort (like merge-recursive before it) makes no attempt to handle break detection; it runs with break detection turned off. It would need to be retrofitted to handle such cases. == Directory rename detection == If one side of history renames directory D/ -> E/, and the other side of history adds new files to D/, then directory rename detection notices and suggests moving those new files to E/. A similar thing is done for paths renamed into D/, causing them to be transitively renamed into E/. The default in the merge machinery is to report a conflict whenever a directory rename might modify the location of a path, so that users can decide whether they wanted the original path or the directory-rename-induced location. However, that means the default codepath still runs through all the directory rename detection logic, it just supplements it with providing conflict notices when it is done. == Building up increasingly complex testcases == I'll start with a really simple directory rename example, and then slowly add twists that explain new pieces until we get to the problematic cases: === Testcase 1 === Let's start with a concrete example, where particular files/directories of interest that exist or are changed on each side are called out: Original: <nothing of note> our side: rename B/file -> C/file their side: rename C/ -> A/ For this case, we'd expect to see the original B/file appear not at C/file but at A/file. (We would also expect a conflict notice that the user will want to choose between C/file and A/file, but I'm going to ignore conflict notices from here on by assuming merge.directoryRenames is set to `true` rather than `conflict`; the only difference that assumption makes is whether that makes the merge be considered to be conflicted and whether it prints a conflict notice; what is written to the index or working directory is unchanged.) === Testcase 2 === Modify testcase 1 by having A/file exist from the start: Original: A/file exists our side: rename B/file -> C/file their side: rename C/ -> A/ In such a case, to avoid user confusion at what looks kind of like an add/add conflict (even though the original path at A/file was not added by either side of the merge), we turn off directory rename detection for this path and print a "in the way" warning to the user: CONFLICT (implicit dir rename): Existing file/dir ... in the way ... The testcases in section 5 of t6423 explore these in more detail. === Testcase 3 === Let's modify testcase 1 in a slightly different way: have A/file be added by their side rather than it already existing. Original: <nothing of note> our side: rename B/file -> C/file their side: rename C/ -> A/ add A/file In this case, the directory rename detection basically transforms our side's original B/file -> C/file into a B/file -> A/file, and so we get a rename/add conflict, with one version of A/file coming from the renamed file, and another coming from the new A/file, each stored as stages 2 and 3 in conflicts. This kind of add/add conflict is perhaps slightly more complex than a regular add/add conflict, but with the printed messages it makes sense where it came from and we have different stages of the file to work with to resolve the conflict. === Testcase 4 === Let's do something similar to testcase 3, but have the opposite side of history add A/file: Original: <nothing of note> our side: rename B/file -> C/file add A/file their side: rename C/ -> A/ Now if we allow directory rename detection to modify C/file to A/file, then we also get a rename/add conflict, but in this case we'd need both higher order stages being recorded on side 2, which makes no sense. The index can't store multiple stage 2 entries, and even if we could, it would probably be confusing for users to work with. So, similar to what we do when there was an A/file in the original version, we simply turn off directory rename detection for cases like this and provide the "in the way" CONFLICT notice to the user. === Testcase 5 === We're slowly getting closer. Let's mix it up by having A/file exist at the beginning but not exist on their side: original: A/file exists our side: rename B/file -> C/file their side: rename C/ -> A/ rename A/file -> D/file For this case, you could say that since A/file -> D/file, it's no longer in the way of C/file being moved by directory rename detection to A/file. But that would give us a case where A/file is both the source and the target of a rename, similar to break detection, which the code isn't currently equipped to handle. This is not yet the case that causes current failures; to the current code, this kind of looks like testcase 4 in that A/file is in the way on our side (since A/file was in the original and was umodified by our side). So, it results in a "in the way" notification with directory rename detection being turned off for A/file so that B/file ends up at C/file. Perhaps the resolution could be improved in the future, but our "in the way" checks prevented such problems by noticing that A/file exists on our side and thus turns off directory rename detection from affecting C/file's location. So, while the merge result could be perhaps improved, the fact that this is currently handled by giving the user an "in the way" message gives the user a chance to resolve and prevents the code from tripping itself up. === Testcase 6 === Let's modify testcase 5 a bit more, to also delete A/file on our side: original: A/file exists our side: rename B/file -> C/file delete A/file their side: rename C/ -> A/ rename A/file -> D/file Now the "in the way" logic doesn't detect that there's an A/file in the way (neither side has an A/file anymore), so it's fine to transitively rename C/file further to A/file...except that we end up with A/file being both the source of one rename, and the target of a different rename. Each rename pair tries to handle the resolution of the source and target paths of its own rename. But when we go to process the second rename pair in process_renames(), we do not expect either the source or the destination to be marked as handled already; so, when we hit the sanity checks that these are not handled: VERIFY_CI(oldinfo); VERIFY_CI(newinfo); then one of these is going to throw an assertion failure since the previous rename pair already marked both of its paths as handled. This will give us an error of the form: git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed. This is the failure we're currently triggering, and it fundamentally depends on: * a path existing in the original * that original path being removed or renamed on *both* sides * some kind of directory rename moving some *other* path into that original path This was added as testcase 12q in t6423. === Testcase 7 === Bonus bug found while investigating! Let's go back to the comparison between testcases 2 & 3, and set up a file present on their side that we need to consider: Original: A/file exists our side: rename B/file -> C/file rename A/file -> D/file their side: rename C/ -> A/ Here, there is no A/file in the way on our side like testcase 4. There is an A/file present on their side like testcase 3, which was an add/add conflict, but that's associated with the file be renamed to D/file. So, that really shouldn't be an add/add conflict because we instead want all modes of the original A/file to be transported to D/file. Unfortunately, the current code kind of treats it like an add/add conflict instead...but even worse. There is also a valid mode for A/file in the original, which normally goes to stage 1. However, an add/add conflict should be represented in the index with no mode at stage 1 (for the original side), only modes at stages 2 and 3 (for our and their side), so for an add/add we'd expect that mode for A/file in the original version to be cleared out (or be transported to D/file). Unfortunately, the code currently leaves not only the stage 3 entry for A/file intact, it also leaves the stage 1 entry for A/file. This results in `git ls-files -u A/file` output of the form: 100644 d00491f 1 A/file 100644 0cfbf08 2 A/file 100644 d00491f 3 A/file This would likely cause users to believe this isn't an add/add conflict; rather, this would lead them to believe that A/file was only modified on our side and that therefore it should not have been a conflict in the first place. And while resolving the conflict in favor of our side is the correct resolution (because stages 1 and 3 should have been cleared out in the first place), this is certainly likely to cause confusion for anyone attempting to investigate why this path was marked as conflicted. This was added as testcase 12p in t6423. == Attempted solutions that I discarded == 1) For each side of history, create a strset of the sources of each rename on the other side of history. Then when using directory renames to modify existing renames, verify that we aren't renaming to a source of another rename. Unfortunately, the "relevant renames" optimization in merge-ort means we often don't detect renames -- we just see a delete and an add -- which is easy to forget and makes debugging testcases harder, but it also turns out that this solution in insufficient to solve the related problems in the area (more on that below). 2) Modify the code to be aware of the possibility of renaming to the source of another side's rename, and make all the conflict resolution logic for each case (including existing rename/rename(2to1) and rename/rename(1to2) cases) handle the additional complexity. It turns out there was much more code to audit than I wanted, for a really niche case. I didn't like how many changes were needed, and aborted. == Solution == We do not want the stages of unrelated files appearing at the same path in the index except when dealing with an add/add conflict. While we previously handled this for stages 2 & 3, we also need to worry about stage 1. So check for a stage 1 index entry being in the way of a directory rename. However, if we can detect that the stage 1 index entry is actually from a related file due to a directory-rename-causes-rename-to-self situation, then we can allow the stage 1 entry to remain. From this wording, you may note that it's not just rename cases that are a problem; bugs could be triggered with directory renames vs simple adds. That leads us to... == Testcases 8+ == Another bonus bug, found via understanding our final solutions (and the failure of our first attempted solution)! Let's tweak testcase 7 a bit: Original: A/file exists our side: delete A/file add -> C/file their side: delete A/file rename C/ -> A/ Here, there doesn't seem to be a big problem. Sure C/file gets modified via the directory rename of C/ -> A/ so that it becomes A/file, but there's no file in the way, right? Actually, here we have a problem that the stage 1 entry of A/file would be combined with the stage 2 entry of C/file, and make it look like a modify/delete conflict. Perhaps there is some extra checking that could be added to the code to make it attempt to clear out the stage 1 entry of A/file, but the various rename-to-self-via-directory-rename testcases make that a bit more difficult. For now, it's easier to just treat this as a path-in-the-way situation and not allow the directory rename to modify C/file. That sounds all well and good, but it does have an interesting side effect. Due to the "relevant renames" optimizations in merge-ort (i.e. only detect the renames you need), 100% renames whose files weren't modified on the other side often go undetected. This means that if we modify this testcase slightly to: Original: A/file exists our side: A/file -> C/file their side: rename C/ -> A/ Then although this looks like where the directory rename just moves C/file back to A/file and there's no problem, we may not detect the A/file -> C/file rename. Instead it will look like a deletion of A/file and an addition of C/file. The directory rename then appears to be moving C/file to A/file, which is on top of an "unrelated" file (or at least a file it doesn't know is related). So, we will report path-in-the-way conflicts now in cases where we didn't before. That's better than silently and accidentally combining stages of unrelated files and making them look like a modify/delete; users can investigate the reported conflict and simply resolve it. This means we tweak the expected solution for testcases 12i, 12j, and 12k. (Those three tests are basically the same test repeated three times, but I was worried when I added those that subtle differences in parent/child, sibling/sibling, and toplevel directories might mess up how rename-to-self testcases actually get handled.) Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
In git-diff, options like `-w` and `-I<regex>`, two files are considered equivalent under the specified "ignore" rules, even when they are not bit-for-bit identical. For options like `--raw`, `--name-status`, and `--name-only`, git-diff deliberately compares only the SHA values to determine whether two files are equivalent, for performance reasons. As a result, a file shown in `git diff --name-status` may not appear in `git diff --patch`. To quickly determine whether two files are equivalent, add a helper function diff_flush_patch_quietly() in diff.c. Add `.dry_run` field in `struct diff_options`. When `.dry_run` is true, builtin_diff() returns immediately upon finding any change. Call diff_flush_patch_quietly() to determine if we should flush `--raw`, `--name-only` or `--name-status` output. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Lidong Yan <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
This style with the empty lines in test bodies was from when the test suite was being developed. Remove the empty lines to match the modern test style. Signed-off-by: Denton Liu <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When "git push <remote> <src>:<dst>" does not spell out the destination side of the ref fully, and when <src> is not given as a reference but an object name, the code tries to give advice messages based on the type of that object. The type is determined by calling odb_read_object_info() and signalled by its return value. The code however reported a programming error with BUG() when this function said that there is no such object, which happens when the object name is given as a full hexadecimal (if the object name is given as a partial hexadecimal or an non-existing ref, the function would have died without returning, so this BUG() wouldn't have triggered). This is wrong. It is an ordinary end-user mistake to give an object name that does not exist and treated as such. An example of the error message produced is as follows: error: The destination you provided is not a full refname (i.e., starting with "refs/"). We tried to guess what you meant by: - Looking for a ref that matches 'branch' on the remote side. - Checking if the <src> being pushed ('0000000000000000000000000000000000000001') is a ref in "refs/{heads,tags}/". If so we add a corresponding refs/{heads,tags}/ prefix on the remote side. Neither worked, so we gave up. You must fully qualify the ref. BUG: remote.c:1221: '0000000000000000000000000000000000000001' should be commit/tag/tree/blob, is '-1' fatal: the remote end hung up unexpectedly Aborted (core dumped) Helped-by: Junio C Hamano <[email protected]> Signed-off-by: Denton Liu <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
For better readability, convert the if-else ladder into a switch statement. Suggested-by: Patrick Steinhardt <[email protected]> Signed-off-by: Denton Liu <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The act of giving "--no-index" tells Git to pretend that the current directory is not under control of any Git index or repository, so even when you happen to be in a Git controlled working tree, where in that working tree should not matter. But the start-up sequence tries to discover the top of the working tree and chdir(2)'s there, even before Git passes control to the subcommand being run. When diff_no_index() starts running, it starts at a wrong (from the end-user's point of view who thinks "git diff --no-index" is merely a better version of GNU diff) directory, and the original directory the user started the command is at "prefix". Because the paths given from argv[] have already been adjusted to account for this path shuffling by prepending the prefix, and showing the resulting path by stripping the prefix, the effect of these nonsense operations (nonsense in the context of "--no-index", that is) is usually not observable. Except for special cases like "-", where it is not preprocessed by prepending the prefix. Instead of papering over by adding more special cases only to cater to the no-index codepath in the generic code, drive the diff machinery more faithfully to what is going on. If the user started "git diff --no-index" in directory X/Y/Z in a working tree controlled by Git, and the start up sequence of Git chdir(2)'ed up to directory X and left Y/Z in the prefix, revert the effect of the start up sequence by chdir'ing back to Y/Z and emptying the prefix. Reported-by: Gregoire Geis <[email protected]> Helped-by: Ramsay Jones <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When line-level log is invoked with more than one disjoint line range in the same file, and one of the commits happens to change that file such that: - the last line of a line range R(n) immediately preceeds the first line modified or added by a hunk H, and - subtracting the number of lines added by hunk H from the start and end of the subsequent line range R(n+1) would result in a range overlapping with line range R(n), then git aborts with an assertion error, because those overlapping line ranges violate the invariants: $ git log --oneline -p 73e4e2f (HEAD -> master) Add lines 6 7 8 9 10 diff --git a/file b/file index 572d5d9..00935f1 100644 --- a/file +++ b/file @@ -3,3 +3,8 @@ Line 2 Line 3 Line 4 Line 5 +Line 6 +Line 7 +Line 8 +Line 9 +Line 10 66e3561 Add lines 1 2 3 4 5 diff --git a/file b/file new file mode 100644 index 0000000..572d5d9 --- /dev/null +++ b/file @@ -0,0 +1,5 @@ +Line 1 +Line 2 +Line 3 +Line 4 +Line 5 $ git log --oneline -L3,5:file -L7,8:file git: line-log.c:73: range_set_append: Assertion `rs->nr == 0 || rs->ranges[rs->nr-1].end <= a' failed. Aborted (core dumped) The line-log machinery encodes line and diff ranges internally as [start, end) pairs, i.e. include 'start' but exclude 'end', and line numbering starts at 0 (as opposed to the -LX,Y option, where it starts at 1, IOW the parameter -L3,5 is represented internally as { start = 2, end = 5 }). The reason for this assertion error and some related issues is that there are a couple of places where 'end' is mistakenly considered to be part of the range: - When a commit modifies an interesting path, the line-log machinery first checks which diff range (i.e. hunk) modify any line ranges. This is done in diff_ranges_filter_touched(), where the outer loop iterates over the diff ranges, and in each iteration the inner loop advances the line ranges supposedly until the current line range ends at or after the current diff range starts, and then the current diff and line ranges are checked for overlap. For HEAD in the above example the first line range [2, 5) ends just before the diff range [5, 10) starts, so the inner loop should advance, and then the second line range [6, 8) and the diff range should be checked for overlap. Unfortunately, the condition of the inner loop mistakenly considers 'end' as part of the line range, and, seeing the diff range starting at 5 and the line range ending at 5, it doesn't skip the first range. Consequently, the diff range and the first line range are checked for overlap, and after that the outer loop runs out of diff ranges, and then the processing goes on in the false belief that this commit didn't touch any of the interesting line ranges. The line-log machinery later shifts the line ranges to account for any added/removed lines in the diff ranges preceeding each line range. This leaves the first line range intact, but attempts to shift the second line range [6, 8) by 5 lines towards the beginning of the file, resulting in [1, 3), triggering the assertion error, because the two overlapping line ranges violate the invariants. Fix that loop condition in diff_ranges_filter_touched() to not treat 'end' as part of the line range. - With the above fix the assertion error is gone... but, alas, we now get stuck in an endless loop! This happens in range_set_difference(), where a couple of nested loops iterate over the line and diff ranges, and a condition is supposed to break the middle loop when the current line range ends before the current diff range, so processing could continue with the next line range. For HEAD in the above example the first line range [2, 5) ends just before the diff range [5, 10) starts, so this condition should trigger and break the middle loop. Unfortunately, just like in the case of the assertion error, this conditions mistakenly considers 'end' as part of the line range, and, seeing the line range ending at 5 and the diff range starting at 5, it doesn't break the loop, which will then go on and on. Fix this condition in range_set_difference() to not treat 'end' as part of the line range. - With the above fix the endless loop is gone... but, alas, the output is now wrong, as it shows both line ranges for HEAD, even though the first line range is not modified by that commit: $ git log --oneline -L3,5:file -L7,8:file 73e4e2f (HEAD -> master) Add lines 6 7 8 9 10 diff --git a/file b/file --- a/file +++ b/file @@ -3,3 +3,3 @@ Line 3 Line 4 Line 5 @@ -6,0 +7,2 @@ +Line 7 +Line 8 66e3561 Add lines 1 2 3 4 5 diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +3,3 @@ +Line 3 +Line 4 +Line 5 In dump_diff_hacky_one() a couple of nested loops are responsible for finding and printing the modified line ranges: the big outer loop iterates over all line ranges, and the first inner loop skips over the diff ranges that end before the start of the current line range. This is followed by a condition checking whether the current diff range starts after the end of the current line range, which, when fulfilled, continues and advances the outer loop to the next line range. For HEAD in the above example the first line range [2, 5) ends just before the diff range [5, 10), so this condition should trigger, and the outer loop should advance to the second line range. Unfortunately, just like in the previous cases, this condition mistakenly considers 'end' as part of the line range, and, seeing the first line range ending at 5 and the diff range starting at 5, it doesn't continue to advance the outher loop, but goes on to show the (unmodified) first line range. Fix this condition to not treat 'end' as part of the line range, just like in the previous cases. After all this the command in the above example finally finishes and produces the right output: $ git log --oneline -L3,5:file -L7,8:file 73e4e2f (HEAD -> master) Add lines 6 7 8 9 10 diff --git a/file b/file --- a/file +++ b/file @@ -6,0 +7,2 @@ +Line 7 +Line 8 66e3561 Add lines 1 2 3 4 5 diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +3,3 @@ +Line 3 +Line 4 +Line 5 Add a canned test similar to the above example, with the line ranges adjusted to the test repository's history. Reported-by: Evgeni Chasnovski <[email protected]> Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When line-level log is invoked with more than one disjoint line range in the same file, and one of the commits happens to change that file such that one diff range modifies more than one line range, then changes to all modified line ranges should be shown, but only the changes in the first modified line range are: $ git log --oneline -p 80ca903 (HEAD -> master) Initial diff --git a/file b/file new file mode 100644 index 0000000..00935f1 --- /dev/null +++ b/file @@ -0,0 +1,10 @@ +Line 1 +Line 2 +Line 3 +Line 4 +Line 5 +Line 6 +Line 7 +Line 8 +Line 9 +Line 10 $ git log --oneline -L1,2:file -L4,5:file -L7,8:file 80ca903 (HEAD -> master) Initial diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +1,2 @@ +Line 1 +Line 2 The line-log-specific diff printer is already clever enough to handle the case when one line range covers multiple diff ranges, but the possibility of one diff range touching multiple disjoint line ranges was apparently overlooked. Add the necessary condition to dump_diff_hacky_one() to handle this case as well, and show all modified line ranges: $ git log --oneline -L1,2:file -L4,5:file -L7,8:file 0f9a5b4 (HEAD -> master) Initial diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +1,2 @@ +Line 1 +Line 2 @@ -0,0 +4,2 @@ +Line 4 +Line 5 @@ -0,0 +7,2 @@ +Line 7 +Line 8 This bug was already present in the initial line-log implementation added in 2da1d1f6f (Implement line-history search (git log -L), 2013-03-28). Interestingly, that commit already contained a canned test case covering a similar scenario: "-L '/long f/',/^}/:a.c -L /main/,/^}/:a.c simple" This test case looks for two line ranges in the same file, and both trace back disjointly to the test repository's inital commit, therefore changes to both line ranges should have been shown for the initial commit, but only changes for the first line range are shown. So this test case should have failed from the very beginning, but it never did, because, unfortunately, the canned expected result is incorrect, as it doesn't include changes for the second line range. A similar test with a similarly incorrect canned expected result was added later in 2096188 (log -L: fix overlapping input ranges, 2013-04-05). Correct these two canned expected results to contain the changes for the second line range for the initial commit as well. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Users reported an issue where objects were missing from their local repositories after a full repack using 'git repack -adf --path-walk'. This was alarming and took a while to create a reproducer. Here, we fix the bug and include a test case that would fail without this fix. The root cause is that certain objects existed in the index and had no second versions. These objects are usually blobs, though trees can be included if a cache-tree exists. The issue is that the revision walk adds these objects to the "pending" list and the path-walk API forgets to mark the lists it creates at this point as "maybe_interesting". If these paths only ever have a single version in the history of the repo (including the current staged version) then the parent directory never tries to add a new object to the list and mark the list as "maybe_interesting". Thus, when walking the list later, the group is skipped as it is expected that no objects are interesting. This happens even when there are actually no UNINTERESTING objects at all! This is based on the optimization enabled by the pack.useSparse=true config option, which is the default. Thus, we create a test case that demonstrates the many cases of this issue for reproducibility: 1. File a/b/c has only one committed version. 2. Files a/i and x/y only exist as staged changes. 3. Tree x/ only exists in the cache-tree. After performing a non-path-walk repack to force all loose objects into packfiles, run a --path-walk repack followed by 'git fsck'. This fsck is what fails with the following errors: error: invalid object 100644 f2e41136... for 'a/b/c' This is the dropped instance of the single-versioned a/b/c file. broken link from tree cfda31d8... to tree 3f725fcd... This is the missing tree for the single-versioned a/b/ directory. missing blob 0ddf2bae... (a/i) missing blob 975fbec8... (x/y) missing blob a60d869d... (file) missing blob f2e41136... (a/b/c) missing tree 3f725fcd... (a/b/) dangling tree 5896d7e... (staged root tree) Note that since the staged root tree is missing, the fsck output cannot even report that the staged x/ tree is missing as well. The core problem here is that the "maybe_interesting" member of 'struct type_and_oid_list' is not initialized to '1'. This member was added in 6333e7a (path-walk: mark trees and blobs as UNINTERESTING, 2024-12-20) in a way to help when creating packfiles for a small commit range using the sparse path algorithm (enabled by pack.useSparse=true). The idea here is that the list is marked as "maybe_interesting" if an object is added that does not have the UNINTERESTING flag on it. Later, this is checked again in case all objects in the list were marked UNINTERESTING after that point in time. In this case, the algorithm skips the list as there is no reason to visit it. This leads to the problem where the "maybe_interesting" member was not appropriately initialized when the list is created from pending objects. Initializing this in the correct places fixes the bug. To reduce risk of similar bugs around initializing this structure, a follow-up change will make initializing lists use a shared method. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The previous change fixed a bug in 'git repack -adf --path-walk' that was due to an update to how path lists are initialized and missing some important cases when processing the pending objects. This change takes the three critical places where path lists are initialized and combines them into a static method. This simplifies the callers somewhat while also helping to avoid a missed update in the future. The other places where a path list (struct type_and_oid_list) is initialized is for the following "fixed" lists: * Tag objects. * Commit objects. * Root trees. * Tagged trees. * Tagged blobs. These lists are created and consumed in different ways, with only the root trees being passed into the logic that cares about the "maybe_interesting" bit. It is appropriate to keep these uses separate. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The fetch code tries to avoid asking the remote side for an object we already have. It does this by traversing recent commits reachable from our refs looking for matches. Commit 5d4cc78 (fetch-pack: die if in commit graph but not obj db, 2024-11-05) introduced an extra check there: if we think we have an object because it's in the commit graph, we double-check that we actually have it in our object database with a call to odb_has_object(). But that call does not pass any flags, and so the function won't call reprepared_packed_git() if it does not find the object. That opens us up to the usual race against some other process repacking the odb: 1. We scan the list of packs in objects/pack but haven't yet opened them. 2. Somebody else packs the object into a new pack (which we don't know about), and deletes the old pack it was in. 3. Our odb_has_object() calls tries to open that old pack, but finds it is gone. We declare that we don't have the object. And this causes us to erroneously complain and abort the fetch, thinking our commit-graph and object database are out of sync. Instead, we should pass HAS_OBJECT_RECHECK_PACKED, which will add a new step: 4. We re-scan the pack directory again, find the new pack, and locate the object. Often the fetch code tries to avoid these kinds of re-scans if it's likely that we won't have the object. If the other side has told us about object X and we want to know if we have it, we'll skip the re-scan (to avoid spending a lot of effort when there are many such objects). We can accept the racy false negative in that case because the worst case is that we ask the other side to send us the object. But this is not one of those cases. These are objects which are accessible from _our_ refs, and which we already found in the commit graph file. We should have them, and if we don't, we'll die() immediately. So the performance impact is negligible, and getting the right answer is important. There's no test here because it's inherently racy. In fact, I had trouble even developing a minimal test. The problem seen in the wild can be produced like this: # Any git.git mirror which supports partial clones; I think this # should work with any repo that contains submodules, but note that # $obj below is specific to this repo url=https://github.com/git/git.git # This is a commit that is not at the tip of any branches (so after # we have it, we'll still have some commits to fetch). obj=cf6f63ea6bf35173e02e18bdc6a4ba41288acff9 git init git fetch --filter=tree:0 $url $obj:refs/heads/foo git checkout foo git commit-graph write --reachable git fetch $url What happens here is that the initial fetch grabs that older commit (and its ancestors) but no trees or blobs, and the subsequent checkout grabs the necessary trees and blobs just for that commit. The final fetch spawns a long sequence of child fetches due to fetch_submodules(), which wants to check whether there have been any gitlink modifications which should trigger a fetch of the related submodule (we'll leave aside the irony that we did not even check out any submodules yet). That series of fetches causes us to accumulate packs, which eventually triggers background maintenance to run. That repacks all-into-one, and the pack containing $obj goes away in favor of a new pack. And then the fetch eventually fails with: fatal: You are attempting to fetch cf6f63e, which is in the commit graph file but not in the object database. In the scenario above, the race becomes likely because of the long series of quick fetches. But I _think_ the bug is independent of partial clones entirely, and you could run into the same thing with a single fetch, some other process running "git repack" simultaneously, and a bit of bad luck. I haven't been able to reproduce, though. I'm not sure if that's because there's some mis-analysis above, or if the race window is just small enough that it's hard to trigger. At any rate, re-scanning here seems like an obviously correct thing to do with no downside, and it does fix the partial-clone case shown above. Reported-by: Дилян Палаузов <[email protected]> Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
This instance of setting the result to 1 before going to cleanup was accidentally removed in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06). Build upon a test that already deletes a packfile to verify that this error propagates to full command failure. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The incremental mode of writing a multi-pack-index has a few extra conditions that could lead to failure, but these are currently short-ciruiting with 'return -1' instead of setting the method's 'result' variable and going to the cleanup tag. Replace these returns with gotos to avoid memory issues when exiting early due to error conditions. Unfortunately, these error conditions are difficult to reproduce with test cases, which is perhaps one reason why the memory loss was not caught by existing test cases in memory tracking modes. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
"git sparse-checkout" subcommand learned a new "clean" action to prune otherwise unused working-tree files that are outside the areas of interest. * ds/sparse-checkout-clean: t: expand tests around sparse merges and clean sparse-index: point users to new 'clean' action sparse-checkout: add --verbose option to 'clean' dir: add generic "walk all files" helper sparse-checkout: match some 'clean' behavior sparse-checkout: add basics of 'clean' command sparse-checkout: remove use of the_repository
Some among "git add -p" and friends ignored color.diff and/or color.ui configuration variables, which is an old regression, which has been corrected. * jk/add-i-color: contrib/diff-highlight: mention interactive.diffFilter add-interactive: manually fall back color config to color.ui add-interactive: respect color.diff for diff coloring stash: pass --no-color to diff plumbing child processes
"git rebase -i" failed to clean-up the commit log message when the command commits the final one in a chain of "fixup" commands, which has been corrected. * pw/rebase-i-cleanup-fix: sequencer: remove VERBATIM_MSG flag rebase -i: respect commit.cleanup when picking fixups
"git subtree" (in contrib/) did not work correctly when splitting squashed subtrees, which has been improved. * cs/subtree-squash-split-fix: contrib/subtree: fix split with squashed subtrees
If one of the two provided paths for git diff --no-index ends in a '/', a failure similar to the following occurs: $ git diff --no-index -- /tmp/ /tmp/ ':!' fatal: `pos + len' is too far after the end of the buffer This occurs because of an incorrect calculation of the skip lengths in diff_no_index(). The code wants to calculate the length of the string, but add one in case the string doesn't end with a slash. The method it uses is incorrect, as it always checks the trailing NUL character of the string. This will never be a '/', so we always add one. In the event that we *do* have a trailing slash, this will create an off-by-one length error later when using the skip value. The most straightforward fix would be to correct the skip1 and skip2 lengths by using ends_with(). However, Johannes made a good point that the existing logic is wasting a lot of computation. We generate the match string by copying the path in and then skipping almost all of it immediately with a potentially expensive memmove() from the strbuf_remove() call. We also re-initialize the match stringbuf each time we call read_directory_contents. The read_directory_contents really wants a path that is rooted at the start of the directory scan. We're currently building this by taking the full path and stripping out the start portion. Instead, replace this logic by building up the portion of the match as we go. Start by initializing two strbuf in diff_no_index containing the empty string. Pass these into queue_diff, which in turn passes the appropriate left or right side into read_directory_contents. As before, we build up the matches by appending elements to the match path and then clearing them using strbuf_setlen. In the recursive portion of the queue_diff algorithm, we build up new match paths the same way that we build up new buffer paths, by appending the elements and then clearing them with strbuf_setlen after each iteration. This is cheaper as it avoids repeated allocations, and is a bit simpler to track what is going on. Add a couple of test cases that pass in paths already ending in '/', to ensure the tests cover this regression. Reported-by: Johannes Schindelin <[email protected]> Closes: https://lore.kernel.org/git/[email protected]/ Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
Junio C Hamano <[email protected]> writes: > Hence, I think the clean-up code of this function this goto ... > >> goto out; > > ... jumps to would need > > strbuf_release(&ps_match1); > strbuf_release(&ps_match2); > > added after that "out:" label? > > If we run this test with leak sanitizer, wouldn't it find leak in > these (I haven't tried it myself---I just am speculating)? Now I did, and my speculations were both correct. The SANITIZE=leak build fails, and with these two releases the test passes. You can squash this in, or I can do so myself if you like, if this is the only change that is required. diff-no-index.c | 2 ++ 1 file changed, 2 insertions(+) Signed-off-by: Johannes Schindelin <[email protected]>
Validate the installer manuallyThe installer was built successfully; |
Here is a snapshot installer/Portable Git/MinGit of this revision: https://github.com/git-for-windows/git-for-windows-automation/actions/runs/18030062242 I plan on releasing this (for practical reasons, it will have to be rebuilt so that the version number matches) as Git for Windows v2.51.0(2) on this coming Monday, September 29th 2025. |
Range-diffgenerated by 'git range-diff --right-only --remerge v2.51.0..upstream/seen v2.51.0.windows.1..prepare-for-v2.51.0(2)'
|
I stumbled over some issues over in |
Okay, here's a fix: git-for-windows/MINGW-packages#163 |
Junio has now mentioned a 2.51.1 |
I really wish that the communication was clearer and more consistent, especially around releases. The Git project makes planning really hard for Git for Windows and other Git users. |
This comment was marked as outdated.
This comment was marked as outdated.
Validate the installer manuallyThe installer was built successfully; |
/release The |
@dscho, please Share on Bluesky and send the announcement email. |
This merges git-for-windows#5840, which I released on Monday, September 29th 2025, as Git for Windows v2.51.0(2). There have been quite a couple of merge conflicts, essentially because we picked up early versions of patches that were sent to the Git mailing list, and which had to undergo the usual process of mutations before being accepted. The only merge conflict that is in another category is that Git for Windows dropped monitoring the GnuPG versions in the `monitor-components` workflow, and `microsoft/git` still wants to delete that workflow lest it runs on a schedule and opens non-actionable work items.
This is a roll-up PR to back-port all kinds of bug fixes that happened in upstream Git since v2.51.0 was released, albeit with no v2.51.1 in sight (not that that means anything, I have been surprised to see .1 releases in my notifications all of a sudden, after no movement on the
maint
branch for weeks and with no clear communication about plans for releasing .1). It includes the following fixes:pw/rebase-i-cleanup-fix
"git rebase -i" failed to clean-up the commit log message when the command commits the final one in a chain of "fixup" commands, which has been corrected.
cs/subtree-squash-split-fix
"git subtree" (in contrib/) did not work correctly when splitting squashed subtrees, which has been improved.
jk/add-i-color
Some among "git add -p" and friends ignored color.diff and/or color.ui configuration variables, which is an old regression, which has been corrected.
ds/sparse-checkout-clean
"git sparse-checkout" subcommand learned a new "clean" action to prune otherwise unused working-tree files that are outside the areas of interest. (this corresponds to [2.51.0 Bug] Missing singleton objects in 'git repack -adf --path-walk' microsoft/git#796)
kn/refs-files-case-insensitive
Deal more gracefully with directory / file conflicts when the files backend is used for ref storage, by failing only the ones that are involved in the conflict while allowing others. This fixes Git Fetch fails for repos with too many branches with v2.51.0 #5804 (which has independently also been reported to the Git mailing list)
sg/line-log-boundary-fixes
A corner case bug in "git log -L..." has been corrected.
ps/upload-pack-oom-protection
A broken or malicious "git fetch" can say that it has the same object for many many times, and the upload-pack serving it can exhaust memory storing them redundantly, which has been corrected.
ds/midx-write-fixes
Fixes multiple crashes around midx write-out codepaths.
ds/path-walk-repack-fix
"git repack --path-walk" lost objects in some corner cases, which has been corrected.
jk/fetch-check-graph-objects-fix
Under a race against another process that is repacking the repository, especially a partially cloned one, "git fetch" may
mistakenly think some objects we do have are missing, which has been corrected.
ly/diff-name-only-with-diff-from-content
Various options to "git diff" that makes comparison ignore certain aspects of the differences (like "space changes are ignored", "differences in lines that match these regular expressions are ignored") did not work well with "--name-only" and friends.
jc/diff-no-index-in-subdir
"git diff --no-index" run inside a subdirectory under control of a Git repository operated at the top of the working tree and stripped the prefix from the output, and oddballs like "-" (stdin) did not work correctly because of it. Correct the set-up by undoing what the set-up sequence did to cwd and prefix.
en/ort-rename-fixes
Various bugs about rename handling in "ort" merge strategy have been fixed.
dl/push-missing-object-error
"git push" had a code path that led to BUG() but it should have been a die(), as it is a response to a usual but invalid end-user action to attempt pushing an object that does not exist.
ps/reflog-migrate-fixes
"git refs migrate" to migrate the reflog entries from a refs backend to another had a handful of bugs squashed.
js/rebase-i-allow-drop-on-a-merge
During interactive rebase, using 'drop' on a merge commit lead to an error, which was incorrect.
Apart from
cs/subtree-squash-split-fix
andpw/rebase-i-cleanup-fix
, which have been applied on top of commits that are more recent than v2.51.0, and which I therefore rebased on top of v2.51.0 instead, I merged the very (read: SHA-same) commits that were also merged upstream.In addition, it also adds an early version of the bug fix for #5836 on top.