Skip to content

Conversation

@xal-0
Copy link
Member

@xal-0 xal-0 commented Jun 25, 2025

A ccall without a specific library finds one with jl_dlfind, which calls
jl_dlsym on each of the following libraries, in order:

  • libjulia-internal
  • libjulia
  • The current executable
  • (On Windows): kernel32, crtdll, ntdll, ws2_32

The semantics of using dlsym (this does not apply to GetProcAddress) with a
specific handle are a little weird: the library and its dependencies are
searched, even if the provided symbol would resolve somewhere else if used in
the library.

On macOS and Linux, this causes all of the calls to libc functions in Base go
through the handle for libjulia-internal, making it difficult to hook
functions. For example, -fsanitize=thread on clang intercepts malloc and free
by defining them in the executable; calls to malloc from Julia code would go to
the original libc version while calls to free in libjulia-internal would go to
the hooked version.

This change makes jl_dlfind return a handle only if the symbol is found in
that library and not one of its dependencies.

Example

a.c:

void b_func(void);

void c_func(void) {
  puts("c override from a");
}

int main() {
  puts("a");
  b_func();
}

b.c:

define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>

void c_func(void);

void b_func(void) {
  puts("b");
  c_func();

  /* How jl_libjulia_internal_handle is set by jl_find_dynamic_library_by_addr */
  Dl_info info;
  dladdr(&b_func, &info);
  void *hdl = dlopen(info.dli_fname, RTLD_NOW | RTLD_NOLOAD | RTLD_LOCAL);
  void *(*dl_c_func)(void) = dlsym(hdl, "c_func");
  dl_c_func();
}

c.c:

#include <stdio.h>

void c_func(void) {
  puts("c");
}
$ cc -g -fPIC -shared -o libc.so c.c
$ cc -g -fPIC -shared -o libb.so b.c libc.so
$ cc -Wl,-rpath,. -g -fPIC -o main a.c libb.so libc.so
$ ./main
a
b
c override from a
c

@topolarity
Copy link
Member

FWIW, I believe that the macOS / Solaris version of this is called RTLD_FIRST (whose naming we should probably ape as a convention)

The design of that feature actually means that handle pointers are not unique on macOS (they contain a bit of information to encode whether they have RTLD_FIRST enabled), so you also need to take that into account when comparing handles.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On macOS, you can additionally specify RTLD_FIRST when getting these internal handles and (like Windows) avoid the need for the extra dladdr call check

@JeffBezanson
Copy link
Member

I appreciate the very clear writeup given the unintuitive behavior here.

@JeffBezanson
Copy link
Member

you also need to take that into account when comparing handles.

How do you do that? Does the implementation here need to be changed?

@xal-0
Copy link
Member Author

xal-0 commented Jun 26, 2025

On macOS, you can additionally specify RTLD_FIRST when getting these internal handles and (like Windows) avoid the need for the extra dladdr call check

I thought it might be too breaking, but you're right that code relying on it wouldn't work on Windows anyhow.

Looking a bit harder for options on Linux, it seems we can get the handle from an address directly with dladdr1 with RTLD_DL_LINKMAP (apparently dlopen directly returns the struct link_map *as the handle?), but there are no nice options to make dlsym do the lookup we want.

@topolarity
Copy link
Member

How do you do that? Does the implementation here need to be changed?

Yeah, it's required to mask off some of the lower bits (see https://github.com/opensource-apple/dyld/blob/3f928f32597888c5eac6003b9199d972d49857b5/src/dyldAPIs.cpp#L1513-L1518 and

julia/src/sys.c

Lines 668 to 670 in 4ae3f5e

// If the handle is the same as what was passed in (modulo mode bits), return this image name
if (((intptr_t)handle & (-4)) == ((intptr_t)probe_lib & (-4)))
return image_name;
)

If we use RTLD_FIRST though, the reverse lookup + handle comparison may no longer be necessary

@vtjnash
Copy link
Member

vtjnash commented Jun 27, 2025

code relying on it wouldn't work on Windows anyhow.

This is the only code that relies on it, and mostly just for the purpose of making these platforms act similarly when ccall is used with an un-scoped symbol, so that should be okay and I think this PR is aligned with what we expected and intended this code to do.

All calls to jl_dlsym from within Julia are now strict about which library the
symbol is from.  Now only Libdl.dlsym uses the default behaviour, to keep it the
same as libc dlsym.

On macOS, we can skip the jl_find_dynamic_library_by_addr check if we detect
that the handle was opened with RTLD_FIRST.  Do this by default for libraries
loaded by Julia ccalls.
@xal-0
Copy link
Member Author

xal-0 commented Aug 1, 2025

Reviving this! I've made both the internal jl_dlsym calls as well as ccalls use no_deps=1, which should result in less confusing differences between platforms. I also used @topolarity suggestion that we can avoid the jl_find_dynamic_library_by_addr check on macOS if we open our handles with RTLD_FIRST by default.

@ararslan
Copy link
Member

ararslan commented Aug 2, 2025

The FreeBSD failures look related:

Error in testset ccall:
Test Failed at /usr/home/julia/.buildkite-agent/builds/freebsd13-amdci6-2/julialang/julia-master/julia-8bd76994c8/share/julia/test/ccall.jl:1988
  Expression: malloc_hdl != ccall(:jl_dlfind, Ptr{Nothing}, (Cstring,), "jl_gc_safepoint")
   Evaluated: Ptr{Nothing}(0x0000000000000001) != Ptr{Nothing}(0x0000000000000001)
Error in testset ccall:
Test Failed at /usr/home/julia/.buildkite-agent/builds/freebsd13-amdci6-2/julialang/julia-master/julia-8bd76994c8/share/julia/test/ccall.jl:1989
  Expression: malloc_hdl != ccall(:jl_dlfind, Ptr{Nothing}, (Cstring,), "jl_array_ptr")
   Evaluated: Ptr{Nothing}(0x0000000000000001) != Ptr{Nothing}(0x0000000000000001)

EDIT: I'm reminded of #50162, which caused #50846, which was fixed in #51114. @topolarity's description in that last PR seems potentially relevant here.

@xal-0
Copy link
Member Author

xal-0 commented Aug 5, 2025

Investigating this has revealed all sorts of weird stuff:

  • jl_exe_handle seems to be set to something sensible only on windows, where GetModuleHandleA(NULL) returns a handle to the current executable. On unix, we set jl_exe_handle = jl_dlopen(NULL, JL_RTLD_NOW);, which returns a handle that behaves like RTLD_DEFAULT.
  • However, the Windows-onlyjl_RTLD_DEFAULT_handle = jl_libjulia_internal_handle; makes no sense to me. As far as I can tell, we have no way of getting RTLD_DEFAULT-like behaviour on Windows with the current jl_dlsym. Stuff like ccall(:memcmp, ...) seems to have been working on unix because of the RTLD_DEFAULT fallback in jl_dlfind, while relying on the msvcrt.dll check on Windows.

In fact, we don't really have RTLD_DEFAULT at all on Windows:

julia> using Libdl

julia> dlopen(".\\usr\\bin\\libccalltest.dll", RTLD_GLOBAL)
Ptr{Nothing}(0x00007ffbf7090000)

julia> @ccall get_c_int()::Cint
ERROR: could not load symbol "get_c_int":
The specified procedure could not be found.

julia> versioninfo()
Julia Version 1.13.0-DEV.959
Commit b35c4f471f (2025-08-04 21:07 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × AMD Ryzen Threadripper 2950X 16-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-20.1.2 (ORCJIT, znver1)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 8 virtual cores)

macOS:

julia> using Libdl

julia> dlopen("./usr/lib/libccalltest.dylib", RTLD_GLOBAL)
Ptr{Nothing}(0x000000006e64eb40)

julia> @ccall get_c_int()::Cint
0

julia> versioninfo()
Julia Version 1.13.0-DEV.959
Commit b35c4f471f0 (2025-08-04 21:07 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 8 × Apple M3
  WORD_SIZE: 64
  LLVM: libLLVM-20.1.2 (ORCJIT, apple-m3)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)

@ararslan
Copy link
Member

ararslan commented Aug 7, 2025

Different failure now on FreeBSD with the most recent commit:

Error in testset ccall:
Test Failed at /usr/home/julia/.buildkite-agent/builds/freebsd13-amdci6-3/julialang/julia-master/julia-11c20e9172/share/julia/test/ccall.jl:1991
  Expression: ccall(:jl_dlfind, Int, (Cstring,), "jl_gc_safepoint") == 2
   Evaluated: 0 == 2

@xal-0
Copy link
Member Author

xal-0 commented Aug 12, 2025

Different failure now on FreeBSD with the most recent commit:

Turns out this was yet another linuxism we were relying on: you can't dlclose a handle but continue to use it for dlsym lookups on FreeBSD.

@xal-0
Copy link
Member Author

xal-0 commented Aug 13, 2025

As much as it's tempting to grab the dlopen reference count from the handle and check if we need to avoid dlclose()ing it, the comment for the struct already seems a little angry at the JVM for doing this: https://github.com/freebsd/freebsd-src/blob/a9f5b68837094699a3d5204d79c1cbe59d93ae00/libexec/rtld-elf/rtld.h#L128-L130

xal-0 and others added 2 commits August 13, 2025 10:00
FreeBSD maintains two separate reference counts inside dlopen handles: one for
total references, including transitive dependencies loaded by rtld, and another
only for dlopen handles.  It checks to make sure that there are unclosed dlopen
handles when doing dlsym, so we've got to ensure that at least one handle
exists for all the libraries we want to use with
jl_find_dynamic_library_by_addr (specifically, libjulia and libjulia-internal).
@ararslan
Copy link
Member

Nice, FreeBSD is green now! However, Windows is not...

Error in testset ccall:
Test Failed at C:\buildkite-agent\builds\win2k22-amdci6-2\julialang\julia-master\julia-92411e7eef\share\julia\test\ccall.jl:1989
  Expression: ccall(:jl_dlfind, Int, (Cstring,), "main") == 1
   Evaluated: 0 == 1

xal-0 added 2 commits August 15, 2025 13:16
FreeBSD returns NULL and no error when using dlopen() with RTLD_NOLOAD on the
main executable, so we can check and return jl_exe_handle directly.
@xal-0
Copy link
Member Author

xal-0 commented Aug 19, 2025

@xal-0 xal-0 requested a review from ararslan August 25, 2025 18:05
@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Sep 11, 2025
@vtjnash vtjnash merged commit 9427f33 into JuliaLang:master Sep 12, 2025
3 of 7 checks passed
@vchuravy
Copy link
Member

Just as a note, this has caused an observable behavior change in MPI.jl

JuliaParallel/MPI.jl#915 (comment)

@adienes adienes removed the merge me PR is reviewed. Merge when all tests are passing label Sep 15, 2025
@giordano
Copy link
Member

This requires knowing internal details of libraries, which isn't easy to do in general: JuliaParallel/MPI.jl#915 (comment)

xal-0 added a commit to xal-0/julia that referenced this pull request Sep 15, 2025
Reverts the behaviour change for ccalls introduced by JuliaLang#58815.  This is motivated
by libraries like MPI, where the dynamic library that actually defines a symbol
is difficult to predict:
JuliaParallel/MPI.jl#915 (comment)
@vtjnash
Copy link
Member

vtjnash commented Sep 15, 2025

This in particular is for consistency with Windows, where this behavior is already mandatory

xal-0 added a commit that referenced this pull request Sep 16, 2025
Reverts the behaviour change for ccalls with an explicit library
introduced by #58815. This is motivated by libraries like MPI, where the
dynamic library that actually defines a symbol is difficult to predict:

JuliaParallel/MPI.jl#915 (comment)
@yuyichao
Copy link
Contributor

It seems that the behavior isn't consistent. The compiler can still find the symbol if a constant library name is used.

xal-0 added a commit to xal-0/julia that referenced this pull request Sep 30, 2025
Reverts the behaviour change for ccalls with an explicit library
introduced by JuliaLang#58815. This is motivated by libraries like MPI, where the
dynamic library that actually defines a symbol is difficult to predict:

JuliaParallel/MPI.jl#915 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants