-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
For some reason I was reading our ENV code, and I noticed we don't have a lock around our calls to getenv and setenv on linux, or for iterating the environment via the environ variable.
This makes ENV unsafe for use with multithreaded julia, as getenv and setenv are not mutually threadsafe in glibc. (See https://github.com/bminor/glibc/blob/master/stdlib/getenv.c and https://github.com/bminor/glibc/blob/master/stdlib/setenv.c. Unsurprisingly getenv is safe by itself, but mysteriously setenv is protected by a lock which getenv ignores! So you can mutually setenv safely, but nobody can presume to getenv elsewhere in the same multithreaded code.)
We can easily add some more locks on our side as mitigation, but unfortunately we can't really fix setenv without fixing glibc; a random C library we link against may decide to call getenv at any point. This definitely happens in practice and I've experienced it personally: OpenMathLib/OpenBLAS#716 (comment). There's a nice discussion on the rust issue tracker including this gem: rust-lang/rust#24741 (comment).
Some options:
- Mitigation: Put our own lock around
getenv/setenv/environaccess, and just hope that no C library we link against calls these functions itself. Add a big warning to the docs. This is the current rust approach though it's fragile. - Avoidance: Ban using libc
setenventirely; create a shadow environment which is a copy of the system environment. This is the C# approach but with Julia's tradition of calling into C libraries I doubt that will work out (it creates surprises even in C#; see https://yizhang82.dev/set-environment-variable) - Fix glibc??: Every portable language runtime has this exact problem, so people have tried fixing this in the past. In older times they were met with surprising hostility. The seemingly-current bug is marked suspended https://sourceware.org/bugzilla/show_bug.cgi?id=15607#c2 whatever that means.
For now the only easy / possible thing to do is to clearly option (1): add some locking and a big warning. If glibc could be fixed it could morph into a long term plan. More recent bugs suggest the glibc maintainers may possibly accept a patch.
As a side note, I feel we should consider removing withenv in the future because the shape of that API misrepresents reality. On the surface withenv appears to offer dynamic scoping, but ENV is global state. So withenv can never work reliably for concurrent code. That is, unless we took option (2) and avoid the C environment completely.