-
Notifications
You must be signed in to change notification settings - Fork 899
Description
I'm very excited about this library. The effort going into documentation and ergonomics is fantastic. I've been waiting to use it on stable Rust for months, and have finally started trying to integrate it into my CPU-bound Python codebase to write extension modules.
Right now, pyo3 seems to have a serious issue to me. The use of the object storage (even the lighter #887 implementation) means that this library is not zero-cost, and worse, can easily leak memory. The additional management layer that's built on top of Python's own reference counting, while ergonomic, is a big problem for my use-case.
The current situation, as I understand it:
-
GilGuardacquire overhead (minimal, and rare):- Acquire a
Mutexon thePyObjectalloc/dealloc array. - Allocate a
Vecto copy thePyObjectalloc/dealloc array out of TLS. - Write to 2x thread-local storage.
- Allocate two
Vecfor storing pointers to owned references.
- Acquire a
-
&PyAnyowned reference creation (mostly method returns, cheap, but very frequent):- Write to thread-local storage.
- Occasionally re-alloate owned reference
Vecsas they grow. - Memory is only freed at GilGuard drop
-
PyObjectdrop (cheap, but frequent):- Read from thread-local storage.
- Possibly acquire a
Mutexon the dealloc array. - Possibly only free memory at next
GitGuardacquire
The regular access to TLS and global mutexes are regrettable, but the memory-freeing behavior is really worrying to me. I'm not writing Python extensions because I want pretty-damn fast and unbounded memory growth. I'm going to the effort of writing a Python extension because I need as fast as possible and predictable memory usage.
The solution outlined in #885 seems perfect. A PyAny<'a> has no need for object storage, and Clone and Drop work as one would expect. I'd even go one stage further and have PyObject and Py<T> always acquire the GIL for Clone and Drop. Sure, it's slower than using the current TLS and deferred drop, but it's simpler, more predictable, and people can always use ManuallyDrop turn them back into PyAny<'a> to batch clone/drop if performance is suffering.
You could even remove the mirrored object API on PyObject and require that they're converted into PyAny<'a> to call or getattr, making PyObject's only purpose to be storing a long-term reference to a Python object between GIL acquisitions.
At that point, there's no additional global state beyond what's already provided by the Python interpreter. Things are deallocated predictably. There's a smaller API surface. The library is truly zero-cost and adds no additional complexity.
I would urge you to be bold with making breaking changes while the library is young and the userbase relatively small. It would be a trememdous shame if the most popular Rust/Python bindings were significantly slower than the equivalent C, and introduced an unexpected additional memory model.
In any case, thanks for all the work you've put in so far. :)