forked from git-for-windows/git
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 106
pack-objects: don't reuse deltas with path walk #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
            dscho
  merged 1 commit into
  microsoft:vfs-2.47.1
from
derrickstolee:push-no-reuse-config-new
  
      
      
   
  Dec 19, 2024 
      
    
                
     Merged
            
            pack-objects: don't reuse deltas with path walk #707
                    dscho
  merged 1 commit into
  microsoft:vfs-2.47.1
from
derrickstolee:push-no-reuse-config-new
  
      
      
   
  Dec 19, 2024 
              
            Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    The --path-walk option in `git pack-objects` is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within `git push` specifically.
While this config does enable the path-walk feature, it does not lead to
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via `git repack -Ad` helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set `gc.auto=0` before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in `git
push`, this change enforces the spawned `git pack-objects` process to
use `--no-reuse-delta`.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
 1. The code in send-pack.c that executes 'git pack-objects' is ignorant
    of whether the current process is a client pushing to a remote or a
    remote sending a fetch or clone to a client.
 2. For servers, it is critical that they trust the previously computed
    deltas whenever possible, or they could overload their CPU
    resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
This commit also adds a test case that demonstrates that `git -c
pack.usePathWalk=true push` now avoids reusing deltas.
To do this, the test case constructs a pack with a horrendously
inefficient delta object, then verifies that the pack on the receiving
side of the `push` fails to have such an inefficient delta.
The test case would probably be a lot more readable if hex numbers were
used instead of octal numbers, but alas, `printf "\x<hex>"` is not
portable, only `printf "\<octal>"` is. For example, dash's built-in
`printf` function simply prints `\x` verbatim while bash's built-in
happily converts this construct to the corresponding byte.
Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
    a5b3d1d    to
    3b225dd      
    Compare
  
    | @derrickstolee I hope you don't mind that I completely rewrote the patch, added a test case, and adjusted the commit message a bit. | 
              
                    mjcheetham
  
              
              approved these changes
              
                  
                    Dec 10, 2024 
                  
              
              
            
            
              
                    derrickstolee
  
              
              commented
              
                  
                    Dec 10, 2024 
                  
              
              
            
            
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @dscho! I'd give it an approval, but I'm unable to do so.
              
                    dscho
  
              
              approved these changes
              
                  
                    Dec 11, 2024 
                  
              
              
            
            
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Dec 19, 2024 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
      
     Merged
  
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Dec 19, 2024 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jan 1, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jan 1, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jan 1, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jan 1, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jan 1, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Feb 10, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
      
     Merged
  
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Feb 27, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Mar 5, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Mar 5, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  mjcheetham 
      pushed a commit
      that referenced
      this pull request
    
      Mar 12, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  mjcheetham 
      pushed a commit
      that referenced
      this pull request
    
      Mar 17, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jun 6, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jun 6, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jun 11, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jun 13, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jun 16, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jun 16, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jun 16, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Jul 8, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Aug 5, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Aug 5, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Aug 8, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Aug 8, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Aug 13, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
      
     Merged
  
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Aug 19, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Oct 17, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Oct 17, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Oct 17, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
    
  dscho 
      added a commit
      that referenced
      this pull request
    
      Oct 28, 2025 
    
    
      
  
    
      
    
  
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
    
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
The --path-walk option in 'git pack-objects' is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via 'git repack -Ad' helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set 'gc.auto=0' before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git push', this change makes the --path-walk option imply --no-reuse-delta when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold:
The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client.
For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability bitmaps, so is not planned to be used by Git servers. Thus, we can reasonably assume (for now) that the --path-walk option is assuming a client-side scenario, either a push or a repack. The repack option will be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a list of objects instead of refs, so we condition this on the case where the --path-walk option will be effective by checking that the --revs option was provided.
Alternative options considered included:
Adding another config ('pack.reuseDelta=false') to opt-in to this choice. However, we already have pack.usePathWalk=true as an opt-in to "do the right thing to make my data small" as far as our internal users are concerned.
Modify the chain between builtin/push.c, transport.c, and builtin/send-pack.c to communicate that we are in "push" mode, not within a fetch or clone. However, this seemed like overkill. It may be beneficial in the future to pass through a mode like this, but it does not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline implementation of this feature within Git for Windows and thus microsoft/git. This feature is still under review upstream.