Unexpected memory management behavior

I'm very excited about this library. The effort going into documentation and ergonomics is fantastic. I've been waiting to use it on stable Rust for months, and have finally started trying to integrate it into my CPU-bound Python codebase to write extension modules.

Right now, `pyo3` seems to have a serious issue to me. The use of the object storage (even the lighter #887 implementation) means that this library is not zero-cost, and worse, can easily leak memory. The additional management layer that's built on top of Python's own reference counting, while ergonomic, is a big problem for my use-case.

The current situation, as I understand it:

- `GilGuard` acquire overhead (minimal, and rare):
    - Acquire a `Mutex` on the `PyObject` alloc/dealloc array.
    - Allocate a `Vec` to copy the `PyObject` alloc/dealloc array out of TLS.
    - Write to 2x thread-local storage.
    - Allocate two `Vec` for storing pointers to owned references.

- `&PyAny` owned reference creation (mostly method returns, cheap, but very frequent):
    - Write to thread-local storage.
    - Occasionally re-alloate owned reference `Vecs` as they grow.
    - **Memory is only freed at GilGuard drop**

- `PyObject` drop (cheap, but frequent):
    - Read from thread-local storage.
    - **Possibly** acquire a `Mutex` on the dealloc array.
    - **Possibly only free memory at next `GitGuard` acquire**

The regular access to TLS and global mutexes are regrettable, but the memory-freeing behavior is really worrying to me. I'm not writing Python extensions because I want *pretty-damn fast* and *unbounded memory growth*. I'm going to the effort of writing a Python extension because I need *as fast as possible* and *predictable memory usage*.

The solution outlined in #885 seems perfect. A `PyAny<'a>` has no need for object storage, and `Clone` and `Drop` work as one would expect. I'd even go one stage further and have `PyObject` and `Py<T>` *always* acquire the GIL for `Clone` and `Drop`. Sure, it's slower than using the current TLS and deferred drop, but it's simpler, more predictable, and people can always use `ManuallyDrop` turn them back into `PyAny<'a>` to batch clone/drop if performance is suffering.

You could even remove the mirrored object API on `PyObject` and require that they're converted into `PyAny<'a>` to `call` or `getattr`, making `PyObject`'s only purpose to be storing a long-term reference to a Python object between GIL acquisitions.

At that point, there's no additional global state beyond what's already provided by the Python interpreter. Things are deallocated predictably. There's a smaller API surface. The library is truly zero-cost and adds no additional complexity.

I would urge you to be bold with making breaking changes while the library is young and the userbase relatively small. It would be a trememdous shame if the most popular Rust/Python bindings were significantly slower than the equivalent C, and introduced an unexpected additional memory model.

In any case, thanks for all the work you've put in so far. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected memory management behavior #1056

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected memory management behavior #1056

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions