Skip to content

Conversation

@vchuravy
Copy link
Member

As JuliaLang/LinearAlgebra.jl#650 and the references mentioned in it point out there is a Fortran ABI issue
with characters. In #32390 (review) we fixed it in one
place, but we have yet to fix it across the code-base.

While investigating JuliaLang/LinearAlgebra.jl#717 I minimzed the code to

import LinearAlgebra.LAPACK: @blasfunc, chkstride1, checksquare, chkfinite, BlasInt, chklapackerror, liblapack

function gebal!(job::AbstractChar, A::AbstractMatrix{Float64})
    chkstride1(A)
    n = checksquare(A)
    chkfinite(A) # balancing routines don't support NaNs and Infs
    ihi = Ref{BlasInt}()
    ilo = Ref{BlasInt}()
    scale = similar(A, Float64, n)
    info = Ref{BlasInt}()
    ccall((@blasfunc(dgebal_), liblapack), Cvoid,
          (Ref{UInt8}, Ref{BlasInt}, Ptr{Float64}, Ref{BlasInt},
           Ptr{BlasInt}, Ptr{BlasInt}, Ptr{Float64}, Ptr{BlasInt}),
          job, n, A, max(1,stride(A,2)), ilo, ihi, scale, info)
    chklapackerror(info[])
    ilo[], ihi[], scale
end

ilo, ihi, scale = gebal!('B', Float64[1 2; 3 4])

Which would reliably segmentation fault or return a spurios error code.

Changing this to:

import LinearAlgebra.LAPACK: @blasfunc, chkstride1, checksquare, chkfinite, BlasInt, chklapackerror, liblapack

function gebal!(job::AbstractChar, A::AbstractMatrix{Float64})
    #chkstride1(A)
    n = checksquare(A)
    #chkfinite(A) # balancing routines don't support NaNs and Infs
    ihi = Ref{BlasInt}()
    ilo = Ref{BlasInt}()
    scale = similar(A, Float64, n)
    info = Ref{BlasInt}()
    ccall((@blasfunc(dgebal_), liblapack), Cvoid,
          (Ref{UInt8}, Ref{BlasInt}, Ptr{Float64}, Ref{BlasInt},
           Ptr{BlasInt}, Ptr{BlasInt}, Ptr{Float64}, Ptr{BlasInt}, Clong),
          job, n, A, max(1,stride(A,2)), ilo, ihi, scale, info, 1)
    chklapackerror(info[])
    ilo[], ihi[], scale
end

ilo, ihi, scale = gebal!('B', Float64[1 2; 3 4])

seems to fix that issue and other related ones in our test suite.

So question for the folks who know more about LAPACK/GFortran (@ViralBShah, @vtjnash, @staticfloat)
should we generally fix the way we call LAPACK and pass the hidden argument along? I can got through LinearAlgebra.jl
and do so, but I wanted to get feedback on that first.

To note is that I am using the BinaryBuilder OpenBLAS binaries, which should be using GCC 6, so this might be a pecularity that arises due to PPC.

@ViralBShah ViralBShah added the linear algebra Linear algebra label Dec 11, 2020
@ViralBShah
Copy link
Member

I believe this is the right way to do it, and we should do it across the board, but I am no expert on the calling conventions.

@vchuravy vchuravy added the backport 1.6 Change should be backported to release-1.6 label Dec 11, 2020
@vchuravy
Copy link
Member Author

Fixes JuliaLang/LinearAlgebra.jl#650

@ViralBShah
Copy link
Member

ViralBShah commented Dec 11, 2020

Impressive that this was not really an issue so far, and I wonder if this fixes any hiesenbugs in linear algebra.

@vchuravy
Copy link
Member Author

So the only hesitation I have here is that the ABI that we are explicitly using is the gfortran ABI. IFort allegedly follows the same ABI, but YMMV

@vchuravy vchuravy added the system:powerpc PowerPC label Dec 12, 2020
@ViralBShah
Copy link
Member

IIRC, ifort supports both, the f2c and the gfort ABIs.

@vchuravy
Copy link
Member Author

vchuravy commented Dec 12, 2020

So for now we could use this strategy and in the 1.7 time-frame consider adding a calling convention for ccall that handles this automatically and could support other ABI's as we encounter them.

@ViralBShah any suggestions for who should review this?

@ViralBShah
Copy link
Member

I am happy to merge. But perhaps @KristofferC or @andreasnoack for a quick check.

@andreasnoack
Copy link
Member

I have little to contrive when it comes to calling convention conversations but I'm curious why this hasn't caused any significant issues. These functions are used by a lot of people all the time.

@ViralBShah
Copy link
Member

ViralBShah commented Dec 13, 2020

The issue linked to in OP discusses this and also links to other blogs. I am guessing single character arguments work because they only dereference only the one memory location.

@ViralBShah
Copy link
Member

We should also pass the suggested -fno-optimize-sibling-calls flag to Fortran libraries in Base and BB.

@andreasnoack
Copy link
Member

Based on the R blog post I'd guess the answer to my question is that we compile with version of gfortran prior to version 7.

What is the proper way to pass the string length in case of multiple string arguments?

@ViralBShah
Copy link
Member

This PR has some cases with multiple arguments - they just all go at the end.

@vchuravy
Copy link
Member Author

Based on the R blog post I'd guess the answer to my question is that we compile with version of gfortran prior to version 7.

We actually don't, we use 6, 7, or 8 depending on the GFortran ABI we compile for.

@ViralBShah
Copy link
Member

@vchuravy Let's go ahead and merge. While it looks like you have done a pretty thorough scan, It would be good for someone else to scan through all the functions after the merge to see if any were missed.

@ViralBShah ViralBShah self-requested a review December 13, 2020 15:17
@ViralBShah
Copy link
Member

Also pinging @dkarrasch to see this PR.

@andreasnoack
Copy link
Member

We actually don't, we use 6, 7, or 8 depending on the GFortran ABI we compile for.

Isn't it a bit surprising that the R world completely broke when they started using 7 and 8 while we've hardly noticed?

@andreasnoack andreasnoack merged commit 9f4a807 into master Dec 13, 2020
@andreasnoack andreasnoack deleted the vc/gfortran_cconv branch December 13, 2020 19:32
@KristofferC KristofferC mentioned this pull request Dec 14, 2020
53 tasks
KristofferC pushed a commit that referenced this pull request Dec 14, 2020
* Adjust calling convetion of dgebal and co

* Adjust calling convention of BLAS & LAPACK

(cherry picked from commit 9f4a807)
@vchuravy
Copy link
Member Author

Isn't it a bit surprising that the R world completely broke when they started using 7 and 8 while we've hardly noticed?

This PR was one of these moments, where I asked myself: How does anything work. I found it surprising as well.

@KristofferC KristofferC removed the backport 1.6 Change should be backported to release-1.6 label Dec 19, 2020
staticfloat pushed a commit that referenced this pull request Jan 15, 2021
* Adjust calling convetion of dgebal and co

* Adjust calling convention of BLAS & LAPACK

(cherry picked from commit 9f4a807)
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
* Adjust calling convetion of dgebal and co

* Adjust calling convention of BLAS & LAPACK
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
* Adjust calling convetion of dgebal and co

* Adjust calling convention of BLAS & LAPACK

(cherry picked from commit 9f4a807)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

linear algebra Linear algebra system:powerpc PowerPC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants