Skip to content

Conversation

@martin-frbg
Copy link
Collaborator

@martin-frbg martin-frbg commented Feb 13, 2017

Quoting patch author amodra from #1078
Lots of issues here.

  • The vsx regs weren't listed as clobbered.
  • Poor choice of vsx regs, which along with the lack of clobbers led to
    trashing v0..v21 and fr14..fr23. Ideally you'd let gcc choose all
    temp vsx regs, but asms currently have a limit of 30 i/o parms.
  • Other regs were clobbered unnecessarily, seemingly in an attempt to
    clobber inputs, with gcc-7 complaining about the clobber of r2.
    (Changed inputs should be also listed as outputs or as an i/o.)
  • "r" constraint used instead of "b" for gprs used in insns where the
    r0 encoding means zero rather than r0.
  • There were unused asm inputs too.
  • All memory was clobbered rather than hooking up memory outputs with
    proper memory constraints, and that and the lack of proper memory
    input constraints meant the asms needed to be volatile and their
    containing function noinline.
  • Some parameters were being passed unnecessarily via memory.
  • When a copy of a pointer input parm was needed, the value passed to
    the asm was incremented in C and decremented in asm, rather than
    using i/o parms, an early clobber constraint, or a temp output reg
    copied in the asm. In most cases a small change to assembly could
    be made that obviated the need for the extra pointer.
  • A number of functions did not compute the final sum or dot-product
    in assembly, instead using scalar code in C.
  • dcbt was bogus.

Quoting patch author amodra from #1078
Lots of issues here.
- The vsx regs weren't listed as clobbered.
- Poor choice of vsx regs, which along with the lack of clobbers led to
  trashing v0..v21 and fr14..fr23.  Ideally you'd let gcc choose all
  temp vsx regs, but asms currently have a limit of 30 i/o parms.
- Other regs were clobbered unnecessarily, seemingly in an attempt to
  clobber inputs, with gcc-7 complaining about the clobber of r2.
  (Changed inputs should be also listed as outputs or as an i/o.)
- "r" constraint used instead of "b" for gprs used in insns where the
  r0 encoding means zero rather than r0.
- There were unused asm inputs too.
- All memory was clobbered rather than hooking up memory outputs with
  proper memory constraints, and that and the lack of proper memory
  input constraints meant the asms needed to be volatile and their
  containing function noinline.
- Some parameters were being passed unnecessarily via memory.
- When a copy of a
@martin-frbg martin-frbg merged commit 040672e into OpenMathLib:develop Feb 21, 2017
tkelman added a commit to JuliaLang/julia that referenced this pull request Mar 19, 2017
tkelman added a commit to JuliaLang/julia that referenced this pull request May 2, 2017
tkelman added a commit to JuliaLang/julia that referenced this pull request May 3, 2017
from OpenMathLib/OpenBLAS#1098

(cherry picked from commit 26beab3)
ref #21091

fix Makefile.system dependency for release-0.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant