-
Couldn't load subscription status.
- Fork 55
DiagonalTensorMap constructor rrule #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DiagonalTensorMap constructor rrule #208
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #208 +/- ##
==========================================
+ Coverage 82.29% 82.44% +0.14%
==========================================
Files 43 43
Lines 5467 5536 +69
==========================================
+ Hits 4499 4564 +65
- Misses 968 972 +4 ☔ View full report in Codecov by Sentry. |
tests for the new code, adds a proper generator of random tangents for DiagonalTensorMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the time to add this, this is really appreciated!
I left a few comments in the code, but overall looks great!
| D = DiagonalTensorMap(d, args...; kwargs...) | ||
| project_D = ProjectTo(D) | ||
| function DiagonalTensorMap_pullback(Δt) | ||
| ∂d = project_D(unthunk(Δt)).data | ||
| return NoTangent(), ∂d, ntuple(_ -> NoTangent(), length(args))... | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation slightly surprises me, I would have expected the projection to be based off d. In the end, these things probably boil down to the same thing?
| D = DiagonalTensorMap(d, args...; kwargs...) | |
| project_D = ProjectTo(D) | |
| function DiagonalTensorMap_pullback(Δt) | |
| ∂d = project_D(unthunk(Δt)).data | |
| return NoTangent(), ∂d, ntuple(_ -> NoTangent(), length(args))... | |
| end | |
| project_d = ProjectTo(d) | |
| function DiagonalTensorMap_pullback(Δt) | |
| ∂d = project_d(unthunk(Δt).data) | |
| return NoTangent(), ∂d, ntuple(_ -> NoTangent(), length(args))... | |
| end | |
| D = DiagonalTensorMap(d, args...; kwargs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me this is not the same. The issue here, is that sometimes \Delta t may be of some non-diagonal type. I expect that in this situation your version will return incorrect tangent (with a lot of zeros from off-diagonal parts of \Delta t). Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, I think I missed this.
Somehow, I was expecting the input to the pullback to always be a DiagonalTensorMap, since this really is a constructor, and if not, there's probably a projector missing in some other rrule...
I also found some comments in the rrules for Diagonal where a similar discussion is taking place.
Long story short though, it seems like their solution is more similar to yours, so that's definitely okay for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also undecided between the two options. I agree that in most cases, e.g. tensor contractions, there should already have been a ProjectTo(D) being called on the adjoint variable Δt before it enters this rrule, and therefore not explicitly having project_d(Δt) in this rule might serve as a way to find missing projectors elsewhere.
On the other hand, from a user perspective, I can also see the advantage of just having this in project_D in here for safety.
While in principle that is independent of still having to call project_d = ProjectTo(d) on the output ∂d, it is probably true that project_d will not be doing anything (will act as the identity) if we already have projected Δt to a diagonal tensor with the right scalar type and storage type.
Co-authored-by: Lukas Devos <[email protected]>
|
I'll have a look tomorrow to help you out, the interplay with the symmetries is somewhat expected. |
|
Thank you! Yes, I have a use-case. I needed to differentiate through a tensor RG algorithm, and this was the last missing detail. For me, everything works perfectly already with what is provided in this PR. However, I also wanted to contribute something useful. I believe the problem lies somewhere in the tests, not in the implementation. I checked the pullback result by hand, and it is correct. I noticed that As for the |
|
I didn't find time today, but it's still on my TODO. This is more or less why I asked in what context you are encountering this: if you really have a diagonal array that you wish to convert into a tensormap, to then work with tensormaps, we should probably define the appropriate converters for that, which include the correct scalings. If not, I would be interested in the stacktrace, or a MWE, such that I might be able to spot where this is coming from. Maybe we should also consider changing/expanding on the definitions of |
Hmmm, in fact, now I cannot reproduce the problem I had. Here is an example of the code where I encountered a problem initially: This returns: To fix this, I defined an |
|
Ah, I see now. The original problem comes from our specializations of matrix functions like The biggest annoyance is using ChainRulesTestUtils.jl, since going from tensors to vectors in that context is understood as "desymmetrizing", ie weighing by quantum dimensions. This would be resolved if we would test As a separate note, the unpacking step might actually also require such a Finally, we could also define the rrules for the matrix functions directly, but again ChainRulesTestUtils is being annoying: julia> sqrt(rand(3,3))
3×3 Matrix{ComplexF64}:
0.251394+0.530824im 0.644466-0.46924im 0.245255+0.000886708im
0.28562+0.198812im 0.729492-0.175747im 0.278139+0.000332103im
0.652492-1.06654im 0.235591+0.942803im 0.368666-0.00178158im
julia> sqrt(rand(3,3))
3×3 Matrix{Float64}:
0.976625 0.341331 -0.00378496
-0.117352 0.820173 0.654396
0.520547 0.424884 0.495172
julia> sqrt(Diagonal(randn(3)))
3×3 Diagonal{Float64, Vector{Float64}}:
0.547541 ⋅ ⋅
⋅ 0.484821 ⋅
⋅ ⋅ 0.111613
julia> sqrt(Diagonal(randn(3)))
ERROR: DomainError with -0.31158918553265935:And since the finite differences tests will generate jacobians for the number of real parameters, this yields jacobians with weird sizes. I'm honestly not sure what the best approach would be |
|
In principle, this should now no longer require the The comment about accessing the fields of tensors still remains, in the sense that we don't really support AD for that, but at least now these functions that access them have custom rules defined. For |
|
It seems like the failing nightly tests are due to this commit, where now additionally everything has become even more problematic: v1: julia> d = Diagonal([1., -1.])
2×2 Diagonal{Float64, Vector{Float64}}:
1.0 ⋅
⋅ -1.0
julia> sqrt(d)
ERROR: DomainError with -1.0:
sqrt was called with a negative real argument but will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
julia> sqrt(Array(d))
2×2 Matrix{ComplexF64}:
1.0+0.0im 0.0+0.0im
0.0+0.0im 0.0+1.0im
julia> sqrt(Array(d) + [0. 0.1; 0. 0.]) # array but not strictly diagonal
2×2 Matrix{ComplexF64}:
1.0+0.0im 0.05-0.05im
0.0+0.0im 0.0+1.0imnightly: julia> using LinearAlgebra
julia> d = Diagonal([1., -1.])
2×2 Diagonal{Float64, Vector{Float64}}:
1.0 ⋅
⋅ -1.0
julia> sqrt(d)
ERROR: DomainError with -1.0:
sqrt was called with a negative real argument but will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
julia> sqrt(Array(d))
ERROR: DomainError with -1.0:
sqrt was called with a negative real argument but will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
julia> sqrt(Array(d) + [0. 0.1; 0. 0.]) # array but not strictly diagonal
2×2 Matrix{ComplexF64}:
1.0+0.0im 0.05-0.05im
0.0+0.0im 0.0+1.0im |
dc03d28 to
58904f7
Compare
test/ad.jl
Outdated
| d = DiagonalTensorMap{T}(undef, V[1]) | ||
| randn!(d.data) | ||
| if T <: Real | ||
| d.data .= abs.(d.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is only necessary for f === sqrt? Alternatively, you could write
logdata = randn(T, rdim(V[1])
d = DiagonalTensorMap(exp.(logdata), V[1])
This will create a purely positive DiagonalTensorMap in the real case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that there literally is a randexp function to do that in one go... Thanks for the suggestion, I'll change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think randexp works with complex numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for complex numbers I don't need them to be positive, so I can just use randn!. I can definitely change it if you want, anything goes I guess?
| include("planar.jl") | ||
| if !(Sys.isapple()) # TODO: remove once we know why this is so slow on macOS | ||
| include("ad.jl") | ||
| # TODO: remove once we know AD is slow on macOS CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very strange construct. Why not
if !(Sys.isapple() && get(ENV, "CI", false))
include("ad.jl")
endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly taken from here.
I think the biggest problem is that the environment variables are strings, so they really become "true" and "false", which makes for an annoying interface.
I don't think the construction above works, because "true" would not evaluate as a bool...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, parse(Bool, get(ENV, "CI", "false"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this, I tried get(ENV, "CI", "false") == "true". Does that work?
|
Seems like this finally worked. Ready to go for me! |
|
Looks good to me as well. I will merge tomorrow morning if the tests have successfully completed. Are we good to tag a patch update? |
This PR adds
rrulefor theDiagonalTensorMapconstructor and definesProjectToforDiagonalTensorMap.