Fix gradient of `OpFromGraph` with disconnected/related outputs #723

ricardoV94 · 2024-04-20T20:08:46Z

Description

PyTensor uses DisconnectedType and NullType variables to raise informative errors when users request gradients wrt to inputs that can't be computed. This is a problem for OpFromGraph which may include parallel graphs, some of which are disconnected/null and others not. We don't want to fail when the user only needs the gradient that's supported.

There was already some special logic before, to handle cases where NullType and DisconnectedType arise from the OFG inner graph. Instead of outputing those types (which OFG cannot produce out of thin air, as they are root variables), we were outputing dummy zeros, and then masking those with the original NullType or DisconnectedType variables created in the internal call to grad/Rop. This seems reasonable if only a bit tedious. This PR first refactors this code to avoid the dummy outputs altogether (there's no reason for them!).

Then it extends this logic to also handle cases where DisconnectedType (but not NullType) arise before the inner graph of OpFromGraph. This was the case behind one of the issues described in #1. When an OFG has multiple outputs, and the requested gradient only uses a subset, PyTensor will feed DisconnectedType variables in place of the output_gradients used by the L_op. The solution to this problem is to filter out these unused input variables. This should be safe, in that if the inner graph of the OFG needs to use these variables and we don't provide them, it will create new DisconnectedTypes on the fly. The pre-existing filtering will then kick in.

This however means we may need distinct OFG from different patterns of disconnected gradients. Accordingly, the cache is now done per pattern.

I suspect this is the issue behind #652

Question: Do we really need to cache stuff?

This PR also deprecates grad_overrides and some options of lop_rop overrides, as well as custom logic for invalid connection_patterns. Hopefully this helps us making OpFromGrah more maintainable.

Related Issue

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

codecov · 2024-04-20T21:51:27Z

Codecov Report

Attention: Patch coverage is 79.50820% with 25 lines in your changes missing coverage. Please review.

Project coverage is 80.94%. Comparing base (fc21336) to head (a96e5a1).
Report is 208 commits behind head on main.

Files with missing lines	Patch %	Lines
pytensor/compile/builders.py	79.50%	14 Missing and 11 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #723      +/-   ##
==========================================
+ Coverage   80.85%   80.94%   +0.08%     
==========================================
  Files         162      162              
  Lines       47043    46945      -98     
  Branches    11514    11481      -33     
==========================================
- Hits        38038    37998      -40     
+ Misses       6750     6706      -44     
+ Partials     2255     2241      -14

Files with missing lines	Coverage Δ
pytensor/gradient.py	`77.37% <ø> (+0.54%)`	⬆️
pytensor/compile/builders.py	`88.38% <79.50%> (+10.93%)`	⬆️

... and 4 files with indirect coverage changes

jessegrabowski

Superficial first pass across the PR. I cannot make informed comment about the actual meat of the changes until I fire up a debugger and try to grok what OpFromGraph is actually doing. I will make an effort to do this in the next 48 hours and give a more meaningful review.

pytensor/compile/builders.py

tests/compile/test_builders.py

pytensor/compile/builders.py

jessegrabowski · 2024-04-21T10:43:51Z

pytensor/compile/builders.py

+        connected_output_grads = [
+            out_grad
+            for out_grad in output_grads
+            if not isinstance(out_grad.type, DisconnectedType)


Why don't output_grads need to check for NullType?

Honestly because I am not sure when Nulltype actually arise.

I prefer the special logic to be as specific as possible, we can reassess if NullTypes also show up in the future?

But the same logic should apply yeah, let's put Null here as well

Actually I don't want to filter those out. If we know that output gradient is Null, we shouldn't omit that information from the inner gradient graph generation. If we omit grad assumes it's simply disconnected.

Omitting disconnected makes sense, because they will default again to disconnected_types if needed by grad

ricardoV94 · 2024-04-21T11:18:45Z

Superficial first pass across the PR. I cannot make informed comment about the actual meat of the changes until I fire up a debugger and try to grok what OpFromGraph is actually doing. I will make an effort to do this in the next 48 hours and give a more meaningful review.

Thanks! It may help to convince yourself that no behavior was changed until commit -2 where the bug fix is done (other than deprecations and removal of special behavior in connection pattern)

ricardoV94 · 2024-05-29T10:32:52Z

I found another issue, if the outputs of an OpFromGraph are not independent, the existing logic fails in that instead of adding the contributions coming from each output, it overrides due to how known_grads we are using internally behaves.

The new test cases in the last commit illustrate this. Any case that depends on out3 fails numerically because we ignore/mask the contributions coming from it.

x, y = dscalars("x", "y")
rng = np.random.default_rng(594)
point = list(rng.normal(size=(2,)))

out1 = x + y
out2 = x * y
out3 = out1 + out2  # Create dependency between outputs
op = OpFromGraph([x, y], [out1, out2, out3])
verify_grad(lambda x, y: pt.add(*op(x, y)), point, rng=rng)
verify_grad(lambda x, y: pt.add(*op(x, y)[:-1]), point, rng=rng)
verify_grad(lambda x, y: pt.add(*op(x, y)[1:]), point, rng=rng)
verify_grad(lambda x, y: pt.add(*op(x, y)[::2]), point, rng=rng)
verify_grad(lambda x, y: op(x, y)[0], point, rng=rng)
verify_grad(lambda x, y: op(x, y)[1], point, rng=rng)
verify_grad(lambda x, y: op(x, y)[2], point, rng=rng)

If instead we defined out3 explicitly as out3 = (x + y) * (x * y) it works fine again

@aseyboldt any idea how we could handle this? In an outer function I think this would be handled by adding the direct contributions to out1/out2 with the inderect ones coming from out3

It seems like I want to initialize those variable grads to the output_grad values, but still allow them to be updated, and not setting them as known which doesn't allow any further updates?

ricardoV94 · 2024-05-29T10:53:29Z

Found a nice(?) hack. Instead of calling Lop internally with known_grads=dict(zip(inner_outputs, output_gradients)) I do it with known_grads(dict(zip(identity_inner_outputs, output_gradients)) where identity_inner_outputs is each inner_output wrapped in a dummy Identity operation. This way we correctly accumulate direct and indirect contributions coming from other inner outputs

aseyboldt

I can't really say that I fully understand the implications of the changes, but it certainly seems like an improvement, so unless someone wants to do a more thorough review, I think we should merge this.

ricardoV94 added bug Something isn't working maintenance OpFromGraph labels Apr 20, 2024

ricardoV94 requested a review from jessegrabowski April 20, 2024 20:08

ricardoV94 changed the title ~~Fix multiple OpFromGrad gradient issues~~ Fix OpFromGraph L_op when output gradients are disconnected Apr 20, 2024

ricardoV94 changed the title ~~Fix OpFromGraph L_op when output gradients are disconnected~~ Fix OpFromGraph with disconnected output gradients Apr 20, 2024

ricardoV94 force-pushed the fix_OFG_grad branch 2 times, most recently from ad37756 to 869cd5f Compare April 20, 2024 20:23

ricardoV94 mentioned this pull request Apr 20, 2024

BUG: Possible OpFromGraph Related Issue for Choosing Samplers in MarginalModel #652

Closed

ricardoV94 added the gradients label Apr 20, 2024

ricardoV94 force-pushed the fix_OFG_grad branch from 869cd5f to 1e7c82b Compare April 20, 2024 21:27

ricardoV94 force-pushed the fix_OFG_grad branch from 1e7c82b to 7a7efa5 Compare April 20, 2024 21:59

jessegrabowski reviewed Apr 21, 2024

View reviewed changes

ricardoV94 mentioned this pull request Apr 27, 2024

Add rewrite to lift linear algebra through certain linalg ops #622

Merged

11 tasks

ricardoV94 added 5 commits May 29, 2024 10:15

Truncate old comment

641434c

Deprecate grad_overrides in OpFromGraph

c25784b

Deprecate use of "default" and Variable as OpFromGrah overrides

a69a9ab

Do not try to infer artificial connection patterns in OpFromGraph

2c9e2b2

Add test for R_Op of OpFromGrah with multiple outputs

9beb8dd

ricardoV94 force-pushed the fix_OFG_grad branch from 7a7efa5 to 4705113 Compare May 29, 2024 09:19

ricardoV94 requested review from aseyboldt and jessegrabowski May 29, 2024 09:20

ricardoV94 force-pushed the fix_OFG_grad branch 2 times, most recently from a0272f7 to 71ec299 Compare May 29, 2024 10:28

ricardoV94 force-pushed the fix_OFG_grad branch 2 times, most recently from 90d9601 to 335d2e0 Compare May 29, 2024 10:55

Refactor code that handles disconnected L_op/R_op outputs in OpFromGraph

44defe9

ricardoV94 force-pushed the fix_OFG_grad branch from 335d2e0 to 3bd9a97 Compare May 29, 2024 11:09

ricardoV94 added 2 commits May 29, 2024 13:10

Fix OpFromGraph L_op with related and/or disconnected outputs

dd9f9d1

Update OpFromGraph TODO

a96e5a1

ricardoV94 force-pushed the fix_OFG_grad branch from 3bd9a97 to a96e5a1 Compare May 29, 2024 11:10

ricardoV94 changed the title ~~Fix OpFromGraph with disconnected output gradients~~ Fix OpFromGraph with disconnected/related outputs May 29, 2024

ricardoV94 changed the title ~~Fix OpFromGraph with disconnected/related outputs~~ Fix gradient of OpFromGraph with disconnected/related outputs May 29, 2024

aseyboldt approved these changes May 29, 2024

View reviewed changes

ricardoV94 merged commit 2143d85 into pymc-devs:main May 29, 2024

Fix gradient of OpFromGraph with disconnected/related outputs #723

Fix gradient of OpFromGraph with disconnected/related outputs #723

Uh oh!

Conversation

ricardoV94 commented Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Checklist

Type of change

Uh oh!

codecov bot commented Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski Apr 21, 2024

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Apr 21, 2024

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Apr 21, 2024

Choose a reason for hiding this comment

Uh oh!

ricardoV94 May 29, 2024

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Apr 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aseyboldt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix gradient of `OpFromGraph` with disconnected/related outputs #723

Fix gradient of `OpFromGraph` with disconnected/related outputs #723

ricardoV94 commented Apr 20, 2024 •

edited

Loading

codecov bot commented Apr 20, 2024 •

edited

Loading

ricardoV94 commented Apr 21, 2024 •

edited

Loading

ricardoV94 commented May 29, 2024 •

edited

Loading

ricardoV94 commented May 29, 2024 •

edited

Loading