Skip to content

Conversation

@markshannon
Copy link
Member

@markshannon markshannon commented Dec 9, 2024

Instead of allocating a tuple, then running arbitrary code with an incomplete tuple, this PR:

  • Allocates a small fixed sized array, fills it then creates the tuple atomically.
  • If the buffer is too small, then a list is created. The tuple is created atomically from that list.
  • Adds the internal _PyList_AsTupleAndClear function which creates a tuple from a list while clearing the list. This avoids unnecessary reference count manipulation.

Since the vast majority of tuples are small, PySequence_Tuple now only does one allocation and no resizing in the common case.

@markshannon markshannon changed the title 127058: Make PySequence_Tuple safer and probably faster. GH-127058: Make PySequence_Tuple safer and probably faster. Dec 9, 2024
Fail:
Py_XDECREF(result);
PyObject *res = _PyList_AsTupleAndClear(list);
Py_DECREF(list);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name _PyList_AsTupleAndClear might be confusing, because if you called Py_CLEAR on the list you wouldn't need to decref it.

Copy link
Member Author

@markshannon markshannon Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you called PyList_Clear on it, you would need to Py_DECREF it.
It is the list being cleared, not the variable holding a reference to it.

@iritkatriel
Copy link
Member

Did you run refleak tests?

Co-authored-by: Irit Katriel <[email protected]>
@markshannon markshannon added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Dec 11, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8badf7b 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Dec 11, 2024
self->ob_item = NULL;
Py_SET_SIZE(self, 0);
ret = _PyTuple_FromArraySteal(items, size);
free_list_items(items, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _PyTuple_FromArraySteal, do we still want to free the list items?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. _PyTuple_FromArraySteal steals the references, but doesn't do anything with the memory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically it is passed a stack-allocated array, so it can't free it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant if _PyTuple_FromArraySteal fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. _PyTuple_FromArraySteal consumes the all references regardless of whether it succeeds or fails.
It would be a pain to use otherwise.

@markshannon markshannon merged commit 5a23994 into python:main Dec 11, 2024
55 checks passed
@markshannon markshannon deleted the safer-sequence-tuple branch December 11, 2024 17:54
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
…ython#127758)

* Use a small buffer, then list when constructing a tuple from an arbitrary sequence.
@user202729
Copy link

note that this pull request modifies the behavior of the following code

import time
class A:
	def __len__(self):
		raise NotImplementedError("infinite sequence")
	def __iter__(self):
		while True:
			time.sleep(1)
			yield 0

tuple(A())

before this change, the above raises an error. Now it enters an infinite loop.

by itself it isn't a problem because the fact that tuple() calls __len__ is an internal implementation detail and can be changed any time, but there isn't any other way for a class to declare itself as not convertible to list/tuple because it is infinite. There is no magic method __list__/__tuple__/__sequence__.

the same thing can be said about tuple(range(10**17))—before the change, immediate MemoryError; after it, consumes an increasing amount of memory before the process gets killed (don't try to run this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants