-
-
Couldn't load subscription status.
- Fork 32
New implementation of _exp! based on generated code
#64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
e6fe92b
generated code exp2
jarlebring 6d70fad
add tests
jarlebring c524d32
first balance
jarlebring 3a77193
fix LAPACK-call
jarlebring e6d5839
_exp2! import in tests
jarlebring ac470b9
update old version update exp_14 etc
jarlebring 0e184dc
Update src/exp2.jl
YingboMa f94346f
ifs instead of eval
jarlebring 0d9373e
remove exp_14,exp_15 since they overflow anyway
jarlebring 08163c2
remove copyto! and unused code
jarlebring 01686f2
Update exp2.jl
ChrisRackauckas 0ae2580
Save result input A
jarlebring 1bafea4
findfirst with .<
jarlebring 5e811e9
nif and Val as req by @chriselrod
jarlebring cf4bdae
better filename
jarlebring 502f060
code gen cleanup
jarlebring c4a0a3b
increase test coverage
jarlebring b999438
move out the LAPACK.gesv call
jarlebring cdd0514
move out the inplace add for UniformScaling
jarlebring 8a80599
use lu! + ldiv! instead of LAPACK
jarlebring ffb0ade
typo in lu! return
jarlebring e47a70d
rename _exp! -> _exp_old! and _exp2! -> _exp!
jarlebring 1f844ac
reduce one allocation in exp_4
jarlebring 01adc36
_exp_old -> _baseexp
jarlebring 0cd4dc0
rename exp2.jl -> exp_noalloc.jl
jarlebring 7b09ad7
comments
jarlebring 16b3a6f
typo in README.md
jarlebring File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| using LinearAlgebra | ||
|
|
||
|
|
||
| @inline function exp_gen!(cache,A,::Val{1}) | ||
| T=promote_type(eltype(A),Float64) # Make it work for many 'bigger' types (matrices and scalars) | ||
| # max_memslots=4 | ||
| n=size(A,1) | ||
| # The first slots are precomputed nodes [:A] | ||
| memslots2 = getmem(cache,2) | ||
| memslots3 = getmem(cache,3) | ||
| memslots4 = getmem(cache,4) | ||
| # Assign precomputed nodes memslots | ||
| memslots1=A # overwrite A | ||
| # Uniform scaling is exploited. | ||
| # No matrix I explicitly allocated. | ||
| # Computation order: A2 Ua U V Z X P | ||
| # Computing A2 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Computing Ua = x*I+x*A2 | ||
| coeff1=60.0 | ||
| coeff2=1.0 | ||
| memslots3 .= coeff2.*memslots2 | ||
| inplace_add!(memslots3,I*coeff1) | ||
| # Computing U with operation: mult | ||
| mul!(memslots4,memslots3,memslots1) | ||
| # Deallocating Ua in slot 3 | ||
| # Deallocating A in slot 1 | ||
| # Computing V = x*I+x*A2 | ||
| coeff1=120.0 | ||
| coeff2=12.0 | ||
| # Smart lincomb recycle A2 | ||
| memslots2 .= coeff2.*memslots2 | ||
| inplace_add!(memslots2,I*coeff1) | ||
| # Computing Z = x*V+x*U | ||
| coeff1=1.0 | ||
| coeff2=-1.0 | ||
| memslots1 .= coeff1.*memslots2 .+ coeff2.*memslots4 | ||
| # Computing X = x*V+x*U | ||
| coeff1=1.0 | ||
| coeff2=1.0 | ||
| # Smart lincomb recycle V | ||
| memslots2 .= coeff1.*memslots2 .+ coeff2.*memslots4 | ||
| # Deallocating U in slot 4 | ||
| # Computing P with operation: ldiv | ||
| ldiv_for_generated!(memslots3, memslots1, memslots2) | ||
| # Deallocating Z in slot 1 | ||
| # Deallocating X in slot 2 | ||
| copyto!(A,memslots3) # Returning P | ||
| end | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| using LinearAlgebra | ||
|
|
||
|
|
||
| @inline function exp_gen!(cache,A,::Val{10}) | ||
| T=promote_type(eltype(A),Float64) # Make it work for many 'bigger' types (matrices and scalars) | ||
| # max_memslots=6 | ||
| n=size(A,1) | ||
| # The first slots are precomputed nodes [:A] | ||
| memslots2 = getmem(cache,2) | ||
| memslots3 = getmem(cache,3) | ||
| memslots4 = getmem(cache,4) | ||
| memslots5 = getmem(cache,5) | ||
| memslots6 = getmem(cache,6) | ||
| # Assign precomputed nodes memslots | ||
| memslots1=A # overwrite A | ||
| # Uniform scaling is exploited. | ||
| # No matrix I explicitly allocated. | ||
| # Computation order: C A2 A4 A6 Ub3 Ub Uc U Vb3 Vb V Z X P S1 S2 S3 S4 S5 | ||
| # Computing C = x*A+x*I | ||
| coeff1=0.03125 | ||
| coeff2=0.0 | ||
| # Smart lincomb recycle A | ||
| memslots1 .= coeff1.*memslots1 | ||
| inplace_add!(memslots1,I*coeff2) | ||
| # Computing A2 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Computing A4 with operation: mult | ||
| mul!(memslots3,memslots2,memslots2) | ||
| # Computing A6 with operation: mult | ||
| mul!(memslots4,memslots2,memslots3) | ||
| # Computing Ub3 = x*A2+x*A4+x*A6 | ||
| coeff1=4.08408e7 | ||
| coeff2=16380.0 | ||
| coeff3=1.0 | ||
| memslots5 .= coeff1.*memslots2 .+ coeff2.*memslots3 .+ coeff3.*memslots4 | ||
| # Computing Ub with operation: mult | ||
| mul!(memslots6,memslots5,memslots4) | ||
| # Deallocating Ub3 in slot 5 | ||
| # Computing Uc = x*I+x*A2+x*A4+x*A6+x*Ub | ||
| coeff1=3.238237626624e16 | ||
| coeff2=1.1873537964288e15 | ||
| coeff3=1.05594705216e13 | ||
| coeff4=3.352212864e10 | ||
| coeff5=1.0 | ||
| # Smart lincomb recycle Ub | ||
| memslots6 .= coeff2.*memslots2 .+ coeff3.*memslots3 .+ coeff4.*memslots4 .+ coeff5.*memslots6 | ||
| inplace_add!(memslots6,I*coeff1) | ||
| # Computing U with operation: mult | ||
| mul!(memslots5,memslots1,memslots6) | ||
| # Deallocating C in slot 1 | ||
| # Deallocating Uc in slot 6 | ||
| # Computing Vb3 = x*A2+x*A4+x*A6 | ||
| coeff1=1.32324192e9 | ||
| coeff2=960960.0 | ||
| coeff3=182.0 | ||
| memslots1 .= coeff1.*memslots2 .+ coeff2.*memslots3 .+ coeff3.*memslots4 | ||
| # Computing Vb with operation: mult | ||
| mul!(memslots6,memslots1,memslots4) | ||
| # Deallocating Vb3 in slot 1 | ||
| # Computing V = x*I+x*A2+x*A4+x*A6+x*Vb | ||
| coeff1=6.476475253248e16 | ||
| coeff2=7.7717703038976e15 | ||
| coeff3=1.29060195264e14 | ||
| coeff4=6.704425728e11 | ||
| coeff5=1.0 | ||
| # Smart lincomb recycle A2 | ||
| memslots2 .= coeff2.*memslots2 .+ coeff3.*memslots3 .+ coeff4.*memslots4 .+ coeff5.*memslots6 | ||
| inplace_add!(memslots2,I*coeff1) | ||
| # Deallocating A4 in slot 3 | ||
| # Deallocating A6 in slot 4 | ||
| # Deallocating Vb in slot 6 | ||
| # Computing Z = x*V+x*U | ||
| coeff1=1.0 | ||
| coeff2=-1.0 | ||
| memslots1 .= coeff1.*memslots2 .+ coeff2.*memslots5 | ||
| # Computing X = x*V+x*U | ||
| coeff1=1.0 | ||
| coeff2=1.0 | ||
| # Smart lincomb recycle V | ||
| memslots2 .= coeff1.*memslots2 .+ coeff2.*memslots5 | ||
| # Deallocating U in slot 5 | ||
| # Computing P with operation: ldiv | ||
| ldiv_for_generated!(memslots3, memslots1, memslots2) | ||
| # Deallocating Z in slot 1 | ||
| # Deallocating X in slot 2 | ||
| # Computing S1 with operation: mult | ||
| mul!(memslots1,memslots3,memslots3) | ||
| # Deallocating P in slot 3 | ||
| # Computing S2 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Deallocating S1 in slot 1 | ||
| # Computing S3 with operation: mult | ||
| mul!(memslots1,memslots2,memslots2) | ||
| # Deallocating S2 in slot 2 | ||
| # Computing S4 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Deallocating S3 in slot 1 | ||
| # Computing S5 with operation: mult | ||
| mul!(memslots1,memslots2,memslots2) | ||
| # Deallocating S4 in slot 2 | ||
| copyto!(A,memslots1) # Returning S5 | ||
| end | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| using LinearAlgebra | ||
|
|
||
|
|
||
| @inline function exp_gen!(cache,A,::Val{11}) | ||
| T=promote_type(eltype(A),Float64) # Make it work for many 'bigger' types (matrices and scalars) | ||
| # max_memslots=6 | ||
| n=size(A,1) | ||
| # The first slots are precomputed nodes [:A] | ||
| memslots2 = getmem(cache,2) | ||
| memslots3 = getmem(cache,3) | ||
| memslots4 = getmem(cache,4) | ||
| memslots5 = getmem(cache,5) | ||
| memslots6 = getmem(cache,6) | ||
| # Assign precomputed nodes memslots | ||
| memslots1=A # overwrite A | ||
| # Uniform scaling is exploited. | ||
| # No matrix I explicitly allocated. | ||
| # Computation order: C A2 A4 A6 Ub3 Ub Uc U Vb3 Vb V Z X P S1 S2 S3 S4 S5 S6 | ||
| # Computing C = x*A+x*I | ||
| coeff1=0.015625 | ||
| coeff2=0.0 | ||
| # Smart lincomb recycle A | ||
| memslots1 .= coeff1.*memslots1 | ||
| inplace_add!(memslots1,I*coeff2) | ||
| # Computing A2 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Computing A4 with operation: mult | ||
| mul!(memslots3,memslots2,memslots2) | ||
| # Computing A6 with operation: mult | ||
| mul!(memslots4,memslots2,memslots3) | ||
| # Computing Ub3 = x*A2+x*A4+x*A6 | ||
| coeff1=4.08408e7 | ||
| coeff2=16380.0 | ||
| coeff3=1.0 | ||
| memslots5 .= coeff1.*memslots2 .+ coeff2.*memslots3 .+ coeff3.*memslots4 | ||
| # Computing Ub with operation: mult | ||
| mul!(memslots6,memslots5,memslots4) | ||
| # Deallocating Ub3 in slot 5 | ||
| # Computing Uc = x*I+x*A2+x*A4+x*A6+x*Ub | ||
| coeff1=3.238237626624e16 | ||
| coeff2=1.1873537964288e15 | ||
| coeff3=1.05594705216e13 | ||
| coeff4=3.352212864e10 | ||
| coeff5=1.0 | ||
| # Smart lincomb recycle Ub | ||
| memslots6 .= coeff2.*memslots2 .+ coeff3.*memslots3 .+ coeff4.*memslots4 .+ coeff5.*memslots6 | ||
| inplace_add!(memslots6,I*coeff1) | ||
| # Computing U with operation: mult | ||
| mul!(memslots5,memslots1,memslots6) | ||
| # Deallocating C in slot 1 | ||
| # Deallocating Uc in slot 6 | ||
| # Computing Vb3 = x*A2+x*A4+x*A6 | ||
| coeff1=1.32324192e9 | ||
| coeff2=960960.0 | ||
| coeff3=182.0 | ||
| memslots1 .= coeff1.*memslots2 .+ coeff2.*memslots3 .+ coeff3.*memslots4 | ||
| # Computing Vb with operation: mult | ||
| mul!(memslots6,memslots1,memslots4) | ||
| # Deallocating Vb3 in slot 1 | ||
| # Computing V = x*I+x*A2+x*A4+x*A6+x*Vb | ||
| coeff1=6.476475253248e16 | ||
| coeff2=7.7717703038976e15 | ||
| coeff3=1.29060195264e14 | ||
| coeff4=6.704425728e11 | ||
| coeff5=1.0 | ||
| # Smart lincomb recycle A2 | ||
| memslots2 .= coeff2.*memslots2 .+ coeff3.*memslots3 .+ coeff4.*memslots4 .+ coeff5.*memslots6 | ||
| inplace_add!(memslots2,I*coeff1) | ||
| # Deallocating A4 in slot 3 | ||
| # Deallocating A6 in slot 4 | ||
| # Deallocating Vb in slot 6 | ||
| # Computing Z = x*V+x*U | ||
| coeff1=1.0 | ||
| coeff2=-1.0 | ||
| memslots1 .= coeff1.*memslots2 .+ coeff2.*memslots5 | ||
| # Computing X = x*V+x*U | ||
| coeff1=1.0 | ||
| coeff2=1.0 | ||
| # Smart lincomb recycle V | ||
| memslots2 .= coeff1.*memslots2 .+ coeff2.*memslots5 | ||
| # Deallocating U in slot 5 | ||
| # Computing P with operation: ldiv | ||
| ldiv_for_generated!(memslots3, memslots1, memslots2) | ||
| # Deallocating Z in slot 1 | ||
| # Deallocating X in slot 2 | ||
| # Computing S1 with operation: mult | ||
| mul!(memslots1,memslots3,memslots3) | ||
| # Deallocating P in slot 3 | ||
| # Computing S2 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Deallocating S1 in slot 1 | ||
| # Computing S3 with operation: mult | ||
| mul!(memslots1,memslots2,memslots2) | ||
| # Deallocating S2 in slot 2 | ||
| # Computing S4 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Deallocating S3 in slot 1 | ||
| # Computing S5 with operation: mult | ||
| mul!(memslots1,memslots2,memslots2) | ||
| # Deallocating S4 in slot 2 | ||
| # Computing S6 with operation: mult | ||
| mul!(memslots2,memslots1,memslots1) | ||
| # Deallocating S5 in slot 1 | ||
| copyto!(A,memslots2) # Returning S6 | ||
| end | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just noticed, the README should get an update too. Follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean documentation of the kwargs in the README.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that no longer describes the method of
_exp!and_baseexp!is not documented, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is the new one based off of the 2008 paper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are both implementations of the Higham 2008 paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh okay, so all that's needed is a baseexp section then.