-
Notifications
You must be signed in to change notification settings - Fork 143
Fix gradient of OpFromGraph
with disconnected/related outputs
#723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
OpFromGraph
with disconnected output gradients
ad37756
to
869cd5f
Compare
869cd5f
to
1e7c82b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #723 +/- ##
==========================================
+ Coverage 80.85% 80.94% +0.08%
==========================================
Files 162 162
Lines 47043 46945 -98
Branches 11514 11481 -33
==========================================
- Hits 38038 37998 -40
+ Misses 6750 6706 -44
+ Partials 2255 2241 -14
|
1e7c82b
to
7a7efa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superficial first pass across the PR. I cannot make informed comment about the actual meat of the changes until I fire up a debugger and try to grok what OpFromGraph is actually doing. I will make an effort to do this in the next 48 hours and give a more meaningful review.
pytensor/compile/builders.py
Outdated
connected_output_grads = [ | ||
out_grad | ||
for out_grad in output_grads | ||
if not isinstance(out_grad.type, DisconnectedType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't output_grads need to check for NullType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly because I am not sure when Nulltype actually arise.
I prefer the special logic to be as specific as possible, we can reassess if NullTypes also show up in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the same logic should apply yeah, let's put Null here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't want to filter those out. If we know that output gradient is Null, we shouldn't omit that information from the inner gradient graph generation. If we omit grad
assumes it's simply disconnected.
Omitting disconnected makes sense, because they will default again to disconnected_types if needed by grad
Thanks! It may help to convince yourself that no behavior was changed until commit -2 where the bug fix is done (other than deprecations and removal of special behavior in connection pattern) |
a0272f7
to
71ec299
Compare
I found another issue, if the outputs of an OpFromGraph are not independent, the existing logic fails in that instead of adding the contributions coming from each output, it overrides due to how The new test cases in the last commit illustrate this. Any case that depends on x, y = dscalars("x", "y")
rng = np.random.default_rng(594)
point = list(rng.normal(size=(2,)))
out1 = x + y
out2 = x * y
out3 = out1 + out2 # Create dependency between outputs
op = OpFromGraph([x, y], [out1, out2, out3])
verify_grad(lambda x, y: pt.add(*op(x, y)), point, rng=rng)
verify_grad(lambda x, y: pt.add(*op(x, y)[:-1]), point, rng=rng)
verify_grad(lambda x, y: pt.add(*op(x, y)[1:]), point, rng=rng)
verify_grad(lambda x, y: pt.add(*op(x, y)[::2]), point, rng=rng)
verify_grad(lambda x, y: op(x, y)[0], point, rng=rng)
verify_grad(lambda x, y: op(x, y)[1], point, rng=rng)
verify_grad(lambda x, y: op(x, y)[2], point, rng=rng) If instead we defined out3 explicitly as @aseyboldt any idea how we could handle this? In an outer function I think this would be handled by adding the direct contributions to out1/out2 with the inderect ones coming from It seems like I want to initialize those variable grads to the output_grad values, but still allow them to be updated, and not setting them as known which doesn't allow any further updates? |
Found a nice(?) hack. Instead of calling Lop internally with |
90d9601
to
335d2e0
Compare
OpFromGraph
with disconnected output gradientsOpFromGraph
with disconnected/related outputs
OpFromGraph
with disconnected/related outputsOpFromGraph
with disconnected/related outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really say that I fully understand the implications of the changes, but it certainly seems like an improvement, so unless someone wants to do a more thorough review, I think we should merge this.
Description
PyTensor uses
DisconnectedType
andNullType
variables to raise informative errors when users request gradients wrt to inputs that can't be computed. This is a problem for OpFromGraph which may include parallel graphs, some of which are disconnected/null and others not. We don't want to fail when the user only needs the gradient that's supported.There was already some special logic before, to handle cases where
NullType
andDisconnectedType
arise from the OFG inner graph. Instead of outputing those types (which OFG cannot produce out of thin air, as they are root variables), we were outputing dummy zeros, and then masking those with the originalNullType
orDisconnectedType
variables created in the internal call tograd
/Rop
. This seems reasonable if only a bit tedious. This PR first refactors this code to avoid the dummy outputs altogether (there's no reason for them!).Then it extends this logic to also handle cases where
DisconnectedType
(but notNullType
) arise before the inner graph of OpFromGraph. This was the case behind one of the issues described in #1. When an OFG has multiple outputs, and the requested gradient only uses a subset, PyTensor will feedDisconnectedType
variables in place of theoutput_gradients
used by theL_op
. The solution to this problem is to filter out these unused input variables. This should be safe, in that if the inner graph of the OFG needs to use these variables and we don't provide them, it will create new DisconnectedTypes on the fly. The pre-existing filtering will then kick in.This however means we may need distinct OFG from different patterns of disconnected gradients. Accordingly, the cache is now done per pattern.
I suspect this is the issue behind #652
Question: Do we really need to cache stuff?
This PR also deprecates
grad_overrides
and some options of lop_rop overrides, as well as custom logic for invalidconnection_patterns
. Hopefully this helps us making OpFromGrah more maintainable.Related Issue
Checklist
Type of change