Skip to content

Conversation

giordano
Copy link
Member

This is updating XLA to openxla/xla@821715b to solve upstream issue and uses Enzyme-JAX version in EnzymeAD/Enzyme-JAX#511 for testing.

@giordano
Copy link
Member Author

This is currently segfaulting during precompilation because the API probably changed a lot. Disabling precompilation workload by adding

[Reactant]
precompile_workload = false

to the LocalPreferences.toml file precompilation is successful and then in a new session we can see

julia> using Reactant, Reactant.XLA

julia> Reactant.initialize_dialect()

julia> if XLA.REACTANT_XLA_RUNTIME == "PJRT"
           client = XLA.PJRT.CPUClient(; checkcount=false)
       elseif XLA.REACTANT_XLA_RUNTIME == "IFRT"
           client = XLA.IFRT.CPUClient(; checkcount=false)
       else
           error("Unsupported runtime: $(XLA.REACTANT_XLA_RUNTIME)")
       end;

julia> x = ConcreteRNumber(2.0; client);

julia> Reactant.compile(sin, (x,); client, optimize=:all)
ERROR: module @reactant_sin attributes {mhlo.num_partitions = 1 : i64, mhlo.num_replicas = 1 : i64} {
  func.func @main(%arg0: tensor<f64> {reactant.donated}) -> tensor<f64> {
    %0 = stablehlo.sine %arg0 : tensor<f64>
    return %0 : tensor<f64>
  }
}
UNIMPLEMENTED: Compile with MLIR Module is not supported.

Stacktrace:
 [1] reactant_err(msg::Cstring)
   @ Reactant.XLA ~/.julia/dev/Reactant/src/xla/Utils.jl:12
 [2] compile(client::Reactant.XLA.PJRT.Client, device::Reactant.XLA.PJRT.Device, mod::Reactant.MLIR.IR.Module; is_sharded::Bool, global_device_ids::Vector{…}, num_outputs::Int64, num_parameters::Int64, num_replicas::Int64, num_partitions::Int64, use_shardy_partitioner::Bool)
   @ Reactant.XLA.PJRT ~/.julia/dev/Reactant/src/xla/PJRT/LoadedExecutable.jl:82
 [3] compile_xla(f::Function, args::Tuple{ConcretePJRTNumber{Float64, 1, Reactant.Sharding.ShardInfo{…}}}; client::Reactant.XLA.PJRT.Client, kwargs::@Kwargs{optimize::Symbol})
   @ Reactant.Compiler ~/.julia/dev/Reactant/src/Compiler.jl:1913
 [4] compile_xla
   @ ~/.julia/dev/Reactant/src/Compiler.jl:1870 [inlined]
 [5] compile(f::Function, args::Tuple{ConcretePJRTNumber{Float64, 1, Reactant.Sharding.ShardInfo{…}}}; sync::Bool, kwargs::@Kwargs{client::Reactant.XLA.PJRT.Client, optimize::Symbol})
   @ Reactant.Compiler ~/.julia/dev/Reactant/src/Compiler.jl:1936
 [6] top-level scope
   @ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types.

@giordano
Copy link
Member Author

Good news is that IFRT works:

julia> using Reactant, Reactant.XLA
Precompiling Reactant...
  1 dependency successfully precompiled in 8 seconds. 76 already precompiled.

julia> client = XLA.IFRT.CPUClient(; checkcount=false)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1742598544.115851 4185742 pjrt_client.cc:525] PjRt-IFRT device count: total=1, addressable=1
I0000 00:00:1742598544.115892 4185742 pjrt_client.cc:529] Addressable PjRt-IFRT device: CpuDevice(id=0)
Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x0000000001e972b0)

julia> x = ConcreteRNumber(2.0; client);

julia> Reactant.compile(sin, (x,); client, optimize=:all)
Reactant.Compiler.Thunk{typeof(sin), Symbol("##sin_reactant#231"), Tuple{ConcreteIFRTNumber{Float64, Reactant.Sharding.ShardInfo{Reactant.Sharding.NoSharding, Nothing}}}, false, Reactant.XLA.IFRT.LoadedExecutable, Reactant.XLA.IFRT.Device}(sin, Reactant.XLA.IFRT.LoadedExecutable(Ptr{Nothing} @0x000000000aa369b0, 1, 1, false, 1, 1), Reactant.XLA.IFRT.Device(Ptr{Nothing} @0x0000000001e92f60))

@giordano
Copy link
Member Author

giordano commented Mar 21, 2025

Segfaults in cuda integration tests with PJRT:

module flag identifiers must be unique (or of 'require' type)
!"Debug Info Version"

[72588] signal 11 (1): Segmentation fault
in expression starting at /home/giordano/.julia/dev/Reactant/test/integration/cuda.jl:20
ScopedDbgInfoFormatSetter at /proc/self/cwd/external/llvm-project/llvm/include/llvm/IR/DebugProgramInstruction.h:694 [inlined]
run at /proc/self/cwd/external/llvm-project/llvm/include/llvm/IR/PassManagerImpl.h:69
runOnOperation at /proc/self/cwd/external/enzyme_ad/src/enzyme_ad/jax/Passes/SROAWrappers.cpp:138
operator() at /proc/self/cwd/external/llvm-project/mlir/lib/Pass/Pass.cpp:526 [inlined]
callback_fn<(lambda at external/llvm-project/mlir/lib/Pass/Pass.cpp:521:7)> at /proc/self/cwd/external/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46 [inlined]
operator() at /proc/self/cwd/external/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69 [inlined]
executeAction<mlir::PassExecutionAction, mlir::Pass &> at /proc/self/cwd/external/llvm-project/mlir/include/mlir/IR/MLIRContext.h:288 [inlined]
run at /proc/self/cwd/external/llvm-project/mlir/lib/Pass/Pass.cpp:520
runPipeline at /proc/self/cwd/external/llvm-project/mlir/lib/Pass/Pass.cpp:592
runPasses at /proc/self/cwd/external/llvm-project/mlir/lib/Pass/Pass.cpp:905 [inlined]
run at /proc/self/cwd/external/llvm-project/mlir/lib/Pass/Pass.cpp:885
mlirPassManagerRunOnOp at /proc/self/cwd/external/llvm-project/mlir/lib/CAPI/IR/Pass.cpp:44
mlirPassManagerRunOnOp at /home/giordano/.julia/dev/Reactant/src/mlir/libMLIR_h.jl:8439 [inlined]
run! at /home/giordano/.julia/dev/Reactant/src/mlir/IR/Pass.jl:74
#run_pass_pipeline!#2 at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:608
run_pass_pipeline! at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:603 [inlined]
#compile_mlir!#12 at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:897
unknown function (ip: 0x7cf632fbc11b)
compile_mlir! at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:753
compile_mlir! at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:753 [inlined]
#compile_xla#39 at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:1888
compile_xla at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:1870 [inlined]
#compile#40 at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:1936
compile at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:1935

For what is worth, backtrace in gdb (not much different than above):

module flag identifiers must be unique (or of 'require' type)
!"Debug Info Version"

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00007ffc8a1e8c2c in llvm::ScopedDbgInfoFormatSetter<llvm::Module>::ScopedDbgInfoFormatSetter (this=0x7fffffff8748, Obj=..., NewState=true)
    at external/llvm-project/llvm/include/llvm/IR/DebugProgramInstruction.h:694
warning: 694    external/llvm-project/llvm/include/llvm/IR/DebugProgramInstruction.h: No such file or directory
(gdb) bt
#0  0x00007ffc8a1e8c2c in llvm::ScopedDbgInfoFormatSetter<llvm::Module>::ScopedDbgInfoFormatSetter (this=0x7fffffff8748, Obj=..., NewState=true)
    at external/llvm-project/llvm/include/llvm/IR/DebugProgramInstruction.h:694
#1  llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (this=0x7fffffff8a70, IR=..., AM=...)
    at external/llvm-project/llvm/include/llvm/IR/PassManagerImpl.h:69
#2  0x00007ffc8027a466 in (anonymous namespace)::SROAWrappersPass::runOnOperation (this=0x1da3530) at external/enzyme_ad/src/enzyme_ad/jax/Passes/SROAWrappers.cpp:138
#3  0x00007ffc8ad2f8ce in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1::operator()() const (this=0x7fffffffa358)
    at external/llvm-project/mlir/lib/Pass/Pass.cpp:526
#4  llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) (callable=140737488331608)
    at external/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46
#5  llvm::function_ref<void ()>::operator()() const (this=<optimized out>) at external/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69
#6  mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) (this=<optimized out>, actionFn=..., irUnits=...,
    args=...) at external/llvm-project/mlir/include/mlir/IR/MLIRContext.h:288
#7  mlir::detail::OpToOpPassAdaptor::run (pass=0x1da3530, op=0x1e80120, am=..., verifyPasses=true, parentInitGeneration=1) at external/llvm-project/mlir/lib/Pass/Pass.cpp:520
#8  0x00007ffc8ad30107 in mlir::detail::OpToOpPassAdaptor::runPipeline (pm=..., op=op@entry=0x1e80120, am=..., verifyPasses=<optimized out>, parentInitGeneration=1, instrumentor=instrumentor@entry=0x0,
    parentInfo=0x0) at external/llvm-project/mlir/lib/Pass/Pass.cpp:592
#9  0x00007ffc8ad32711 in mlir::PassManager::runPasses (this=0x2591350, op=0x1e80120, am=...) at external/llvm-project/mlir/lib/Pass/Pass.cpp:905
#10 mlir::PassManager::run (this=0x2591350, op=0x1e80120) at external/llvm-project/mlir/lib/Pass/Pass.cpp:885
#11 0x00007ffc89255639 in mlirPassManagerRunOnOp (passManager=..., op=...) at external/llvm-project/mlir/lib/CAPI/IR/Pass.cpp:44
#12 0x00007ffc7348e6da in mlirPassManagerRunOnOp () at /home/giordano/.julia/dev/Reactant/src/mlir/libMLIR_h.jl:8439
#13 julia_run!_23282 () at /home/giordano/.julia/dev/Reactant/src/mlir/IR/Pass.jl:74
#14 0x00007ffc73588ebb in julia_#run_pass_pipeline!#2_25044 () at /home/giordano/.julia/dev/Reactant/src/Compiler.jl:608

Reported upstream: EnzymeAD/Enzyme-JAX#515

This comment was marked as off-topic.

@giordano
Copy link
Member Author

Last two Julia places where we are before the segfault are

API.mlirPassManagerRunOnOp(pm, Operation(mod))

function mlirPassManagerRunOnOp(passManager, op)
@ccall mlir_c.mlirPassManagerRunOnOp(
passManager::MlirPassManager, op::MlirOperation
)::MlirLogicalResult
end

and then we're entirely in MLIR/LLVM land.

@giordano
Copy link
Member Author

CUDA integration tests pass for me on openxla/xla#24050. When that PR is merged we can use the newer version of XLA and we should hopefully be good (hoping nothing else breaks down).

@giordano giordano marked this pull request as ready for review March 22, 2025 21:58
@giordano giordano merged commit cb49c07 into EnzymeAD:main Mar 23, 2025
2 of 3 checks passed
@giordano giordano deleted the mg/update-xla branch March 23, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants