-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Closed
Labels
type-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Most static strings are interned during Python initialization in _PyUnicode_InitStaticStrings. However, the _Py_LATIN1_CHR characters (code points 0-255) are static, but not interned. They may be interned later while the Python is running. This can happen for various reasons, including calls to sys.intern.
This isn't thread-safe: it modifies the hashtable _PyRuntime.cached_objects.interned_strings, which is shared across threads and interpreters, without any synchronization.
It also can break the interning identity invariant. You can have a non-static, interned 1-characters string later shadowed by the global interning of the static 1-character string.
Suggestions
- The
_PyRuntime.cached_objects.interned_stringsshould be immutable. We should not modify it afterPy_Initialize()until shutdown (i.e.,_PyUnicode_ClearInternedcalled fromfinalize_interp_types()) - The 1-character latin1 strings should be interned. This can either be by explicitly interning them during startup, or by handling 1-character strings specially in
intern_common.
cc @encukou @ericsnowcurrently
Linked PRs
encukou
Metadata
Metadata
Assignees
Labels
type-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error