Fix concurrency consistency for `internals_pp_manager` under multiple-interpreters #5947

XuehaiPan · 2025-12-25T05:30:44Z

Description

This is a follow-up for:

Add per-interpreter storage for gil_safe_call_once_and_store #5933

See discussion:

Suggested changelog entry:

Fix concurrency consistency for internals_pp_manager under multiple-interpreters

…preter only

…rpreters-concurrency

…ith the same class names.

The pre_init needs to check if it is in a subinterpreter or not. But in 3.13+ this static initializer runs in the main interpreter. So we need to check this later, during the exec phase.

…was where it was... Should not hurt anything to do it extra times here.

The count was not used, it was just checked for > 1, we now accomplish this by setting the flag.

…rsions

tests/mod_per_interpreter_gil_with_singleton.cpp

Was disabled in e4873e8

On Windows with MSVC (multi-configuration generators), CMake uses config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG when set, otherwise falls back to LIBRARY_OUTPUT_DIRECTORY/<Config>/. The main test modules (pybind11_tests, etc.) correctly set both LIBRARY_OUTPUT_DIRECTORY and the config-specific variants (lines 517-528), so they output directly to tests/. However, the mod_per_interpreter_gil* modules only copied the base LIBRARY_OUTPUT_DIRECTORY property, causing them to be placed in tests/Debug/ instead of tests/. This mismatch caused test_import_in_subinterpreter_concurrently and related tests to fail with ModuleNotFoundError on Windows Python 3.14, because the test code sets sys.path based on pybind11_tests.__file__ (which is in tests/) but tries to import mod_per_interpreter_gil_with_singleton (which ended up in tests/Debug/). This bug was previously masked by @pytest.mark.xfail decorators on these tests. Now that the underlying "Duplicate C++ type registration" issue is fixed and the xfails are removed, this path issue surfaced. The fix mirrors the same pattern used for main test targets: also set LIBRARY_OUTPUT_DIRECTORY_<CONFIG> for each configuration type.

rwgk · 2025-12-26T16:55:08Z

include/pybind11/subinterpreter.h

            // upon success, the new interpreter is activated in this thread
            result.istate_ = result.creation_tstate_->interp;
-            detail::get_num_interpreters_seen() += 1; // there are now many interpreters
+            detail::has_seen_non_main_interpreter() = true; // there are now many interpreters


I don't want to push this right now, to not restart the CI (half done), but to let you know about this asap:

$ git show commit 801d5f98396753bdf25e9d87e747a7cf4edf5f3c (HEAD -> XuehaiPan→fix-multiple-interpreters-concurrency) Author: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com> Date: Fri Dec 26 08:52:16 2025 -0800 Remove outdated comment after function rename The comment "there are now many interpreters" was a holdover from when the function was named get_num_interpreters_seen(). With the rename to has_seen_non_main_interpreter(), the function name is now self-documenting. diff --git a/include/pybind11/subinterpreter.h b/include/pybind11/subinterpreter.h index e918b90c..c47787b6 100644 --- a/include/pybind11/subinterpreter.h +++ b/include/pybind11/subinterpreter.h @@ -109,7 +109,7 @@ public: // upon success, the new interpreter is activated in this thread result.istate_ = result.creation_tstate_->interp; - detail::has_seen_non_main_interpreter() = true; // there are now many interpreters + detail::has_seen_non_main_interpreter() = true; detail::get_internals(); // initialize internals.tstate, amongst other things... // In 3.13+ this state should be deleted right away, and the memory will be reused for

rwgk · 2025-12-26T17:01:12Z

Review of PR #5947: Fix concurrency consistency for `internals_pp_manager` under multiple-interpreters

PR: #5947
Reviewed by: rwgk (with Cursor assistance)
Date: 2025-12-26

Summary

This PR fixes issues with subinterpreter support, particularly the "subinterpreter_before_main" failure that occurred when a pybind11 module was first loaded in a subinterpreter before being loaded in the main interpreter. The fix enables removal of @pytest.mark.xfail decorators from several multiple-interpreter tests.

Core Problem

In Python 3.13+, the module loading process changed to have two distinct phases:

Static init phase (PyInit_*) - runs once per process, always in the main interpreter
Exec phase (Py_mod_exec slot) - runs per interpreter

The original pybind11 code detected subinterpreters during the static init phase:

#define PYBIND11_MODULE(name, variable, ...)                                      \
    PYBIND11_MODULE_PYINIT(                                                       \
        name, (pybind11::detail::get_num_interpreters_seen() += 1), ##__VA_ARGS__) \
    ...

This was incorrect for Python 3.13+ because the static init phase always runs in the main interpreter, even when loading a module from a subinterpreter. This caused the "subinterpreter_before_main" test to fail.

Key Changes

1. Move Interpreter Detection to Exec Phase (commit `f7a0e04`)

The crux of the fix. The pre_init code is removed from the static init phase, and ensure_internals() is now called in the exec phase where the correct interpreter context is available:

int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) {
    try {
        pybind11::detail::ensure_internals();  // Now checks in the correct interpreter
        ...

The new ensure_internals() function properly detects non-main interpreters:

inline void ensure_internals() {
    pybind11::detail::get_internals_pp_manager().unref();
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
    if (PyInterpreterState_Get() != PyInterpreterState_Main()) {
        has_seen_non_main_interpreter() = true;
    }
#endif
    pybind11::detail::get_internals();
}

2. Semantic Change: Counter → Boolean Flag (commit `a567962`)

Changed from get_num_interpreters_seen() > 1 to has_seen_non_main_interpreter().

This is not just a cleanup - it's a semantic fix:

Scenario	Old: `counter > 1`	New: `has_seen_non_main_interpreter()`
Module first loaded in subinterpreter	counter = 1, check fails ❌	Flag set to true ✓
Module first loaded in main, then subinterpreter	counter = 2, check passes ✓	Flag set to true ✓

The old counter-based approach only worked correctly when the main interpreter loaded the module first.

3. Re-enable Subinterpreter Support on Ubuntu 3.14 (commit `755839c`)

Removed the workaround -DPYBIND11_HAS_SUBINTERPRETER_SUPPORT=0 from ci.yml that was added in commit e4873e8 as a temporary fix. Now that the core issue is resolved, subinterpreter support can be enabled.

4. Namespace Fix for Test Module (commit `8f29f8e`)

Added proper namespace to mod_per_interpreter_gil_with_singleton.cpp to avoid libc++ issues with anonymous namespaces (as noted by @rwgk in PR comments referencing issue #4319).

5. Remove xfail Decorators from Tests

The following tests no longer need @pytest.mark.xfail:

test_import_module_with_singleton_per_interpreter
test_import_in_subinterpreter_after_main
test_import_in_subinterpreter_before_main
test_import_in_subinterpreter_concurrently

Windows CI Failure Analysis and Fix

The Problem

After pushing the PR changes, Windows Python 3.14 CI jobs failed with:

ModuleNotFoundError: No module named 'mod_per_interpreter_gil_with_singleton'

This was puzzling because the module was successfully built.

Root Cause

On Windows with MSVC (multi-configuration generators), CMake uses config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG. The main test modules correctly set both LIBRARY_OUTPUT_DIRECTORY and the config-specific variants:

# Lines 517-528 in tests/CMakeLists.txt - for main test modules
if(NOT CMAKE_LIBRARY_OUTPUT_DIRECTORY)
  set_target_properties(${target} PROPERTIES LIBRARY_OUTPUT_DIRECTORY
                                             "${CMAKE_CURRENT_BINARY_DIR}")
  if(DEFINED CMAKE_CONFIGURATION_TYPES)
    foreach(config ${CMAKE_CONFIGURATION_TYPES})
      string(TOUPPER ${config} config)
      set_target_properties(${target} PROPERTIES LIBRARY_OUTPUT_DIRECTORY_${config}
                                                 "${CMAKE_CURRENT_BINARY_DIR}")
    endforeach()
  endif()
endif()

However, the mod_per_interpreter_gil* modules only copied the base property:

# Lines 590-594 - BEFORE fix
get_target_property(pybind11_tests_output_directory pybind11_tests LIBRARY_OUTPUT_DIRECTORY)
foreach(mod IN LISTS PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES)
  set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY
                                            "${pybind11_tests_output_directory}")
endforeach()

Result:

pybind11_tests → tests/ (has LIBRARY_OUTPUT_DIRECTORY_DEBUG set)
mod_per_interpreter_gil* → tests/Debug/ (missing config-specific property)

The test code sets sys.path based on pybind11_tests.__file__ location (tests/), but the module was in tests/Debug/.

Why This Was Previously Hidden

The @pytest.mark.xfail decorators masked this bug. The tests were already expected to fail (for the "Duplicate C++ type registration" reason), so the Windows path issue was never noticed.

The Fix (commit `3977e2d`)

get_target_property(pybind11_tests_output_directory pybind11_tests LIBRARY_OUTPUT_DIRECTORY)
foreach(mod IN LISTS PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES)
  set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY
                                            "${pybind11_tests_output_directory}")
  # Also set config-specific output directories for multi-configuration generators (MSVC)
  if(DEFINED CMAKE_CONFIGURATION_TYPES)
    foreach(config ${CMAKE_CONFIGURATION_TYPES})
      string(TOUPPER ${config} config)
      set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY_${config}
                                                "${pybind11_tests_output_directory}")
    endforeach()
  endif()
endforeach()

Additional Cleanup (commit 801d5f98)

Removed an outdated comment in subinterpreter.h:

// Before:
detail::has_seen_non_main_interpreter() = true; // there are now many interpreters

// After:
detail::has_seen_non_main_interpreter() = true;

The comment was a holdover from when the function was named get_num_interpreters_seen(). The new function name is self-documenting.

Design Notes

Double `ensure_internals()` Calls

The code calls ensure_internals() in both:

PYBIND11_PLUGIN_IMPL (static init phase)
pybind11_exec_* (exec phase)

This is intentional (see commit 3b54dcf):

For Python < 3.13: Static init can run in subinterpreters, so both calls are useful
For Python >= 3.13: Static init always runs in main interpreter, so only exec phase call matters
Calling it twice is harmless (idempotent operations)

Thread Safety

has_seen_non_main_interpreter() uses std::atomic_bool, ensuring thread-safe access when multiple interpreters may be running concurrently.

Commits Added During Review

3977e2d - Fix mod_per_interpreter_gil* output directory on Windows/MSVC
801d5f98 - Remove outdated comment after function rename

Conclusion

The PR correctly fixes the subinterpreter timing issue in Python 3.13+ and properly handles the "subinterpreter_before_main" scenario. The Windows CI fix we added ensures the tests can find the built modules. The changes are well-structured and the xfails are appropriately removed.

Recommendation: Ready to merge once CI is green.

rwgk

Please let me know any corrections to the review comment I posted a minute ago.

It's way past bedtime where I am at the moment: assuming the CI is green now, could you please remove the outdated comment, and then go ahead merge this change? (b-pass can merge)

XuehaiPan · 2025-12-26T18:13:17Z

The current state looks good to me. All CI checks are green now.

FYI, my downstream PR using this patch now works without segfaults or internals consistency issues:

The only unexpected output is here: https://github.com/metaopt/optree/actions/runs/20518681667/job/58950377906?pr=245

class PyBindCppIter {
public:
    explicit PyBindCppIter(const py::object &obj) : m_obj{obj} {}
    PyBindCppIter() = delete;
    ~PyBindCppIter() = default;
    PyBindCppIter(const PyBindCppIter &) = delete;
    PyBindCppIter &operator=(const PyBindCppIter &) = delete;
    PyBindCppIter(PyBindCppIter &&) = delete;
    PyBindCppIter &operator=(PyBindCppIter &&) = delete;

    PyBindCppIter &iter() noexcept { return *this; }
    py::object next() {
#if defined(Py_GIL_DISABLED)
        std::scoped_lock lock(m_mutex);
#endif
        return m_obj.attr("__next__")();
    }

private:
    py::object m_obj;
#if defined(Py_GIL_DISABLED)
    mutable std::mutex m_mutex{};
#endif
};

import itertools
from concurrent.futures import ThreadPoolExecutor

# Initialize the iterator
it = iter(PyBindCppIter(range(N)))  # expected to be a sorted thread-safe iterator

# Each thread consumes the same iterator and collects results
with ThreadPoolExecutor(max_workers=num_workers) as executor:
    # Note: map(list, [it] * num_workers) will cause multiple threads to call
    # next(it) on the same pybind object's `__next__` concurrently.
    sequences = list(executor.map(list, [it] * num_workers))

# Verification 1: Integrity
# The combined sequences should cover the entire range and be mutually exclusive.
# PASS:
#   - Python 3.14t (only seen the main interpreter)
#   - Python 3.14t (seen the main interpreter and sub-interpreters
#   - Python 3.14 (only seen the main interpreter)
#   - Python 3.14 (seen the main interpreter and sub-interpreters)
assert sorted(itertools.chain.from_iterable(sequences)) == list(range(N))

# Verification 2: Monotonicity (Order)
# The subsequences should be also sorted individually.
for seq in sequences:
    # PASS:
    #   - Python 3.14t (only seen the main interpreter)
    #   - Python 3.14t (seen the main interpreter and sub-interpreters)
    #   - Python 3.14 (only seen the main interpreter)
    # FAIL:
    #   - Python 3.14 (seen the main interpreter and sub-interpreters)
    assert seq == sorted(seq), f'Subsequence is not sorted: {seq}'

I'm not entirely sure if this stems from the pybind11 call policy mechanism or a bug within CPython itself.

Test Results:

Python 3.14t tests: ALL PASS
Python 3.14 (without changing detail::has_seen_non_main_interpreter() flag): PASS (using pytest -k 'not subinterpreter')
Python 3.14 (running both threading (in the main interpreter only) and subinterpreter tests): Fails with function output consistency issues.

I will investigate this further on my end; it should not be a blocker for merging this PR.

XuehaiPan added 2 commits December 25, 2025 13:18

Add per-interpreter storage for gil_safe_call_once_and_store

5c7e1d7

Disable thread local cache for internals_pp_manager

4ef8b0c

XuehaiPan mentioned this pull request Dec 25, 2025

Add per-interpreter storage for gil_safe_call_once_and_store #5933

Merged

Disable thread local cache for internals_pp_manager for multi-inter…

aa1c3aa

…preter only

XuehaiPan marked this pull request as ready for review December 25, 2025 07:22

XuehaiPan requested a review from henryiii as a code owner December 25, 2025 07:22

XuehaiPan changed the title ~~Fix concurrency consistency for internals_pp_manager under multiple-interpreters~~ [WIP] Fix concurrency consistency for internals_pp_manager under multiple-interpreters Dec 25, 2025

XuehaiPan and others added 3 commits December 25, 2025 16:14

Merge remote-tracking branch 'upstream/master' into fix-multiple-inte…

cdd7d10

…rpreters-concurrency

Use anonymous namespace to separate these type_ids from other tests w…

ec908c6

…ith the same class names.

style: pre-commit fixes

aeeb340

b-pass self-requested a review December 25, 2025 21:41

b-pass and others added 9 commits December 25, 2025 17:10

Revert internals_pp_manager changes

49952a8

This is the crux of fix for the subinterpreter_before_main failure.

f7a0e04

The pre_init needs to check if it is in a subinterpreter or not. But in 3.13+ this static initializer runs in the main interpreter. So we need to check this later, during the exec phase.

Continue to do the ensure in both places, there might be a reason it …

3b54dcf

…was where it was... Should not hurt anything to do it extra times here.

Change get_num_interpreters_seen to a boolean flag instead.

a567962

The count was not used, it was just checked for > 1, we now accomplish this by setting the flag.

Spelling typo

857e4a5

Work around older python versions, only need this check for newish ve…

e1204b2

…rsions

Add more comments for test case

0ad3ec2

Add more comments for test case

b2d82d6

Stop traceback propagation

0a142b1

rwgk reviewed Dec 26, 2025

View reviewed changes

tests/mod_per_interpreter_gil_with_singleton.cpp Outdated Show resolved Hide resolved

b-pass and others added 5 commits December 25, 2025 23:13

Re-enable subinterpreter support on ubuntu 3.14 builds

755839c

Was disabled in e4873e8

As suggested, don't use an anonymous namespace.

8f29f8e

Typo in test assert format string

3838ff1

Use a more appropriate function name

ed20cfc

rwgk reviewed Dec 26, 2025

View reviewed changes

rwgk approved these changes Dec 26, 2025

View reviewed changes

b-pass approved these changes Dec 26, 2025

View reviewed changes

XuehaiPan and others added 2 commits December 27, 2025 02:24

Remove unneeded pytest.importorskip

8a5fdd3

Remove comment

70f9245

b-pass changed the title ~~[WIP] Fix concurrency consistency for internals_pp_manager under multiple-interpreters~~ Fix concurrency consistency for internals_pp_manager under multiple-interpreters Dec 26, 2025

b-pass merged commit fee2527 into pybind:master Dec 26, 2025
84 of 87 checks passed

github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix concurrency consistency for `internals_pp_manager` under multiple-interpreters #5947

Fix concurrency consistency for `internals_pp_manager` under multiple-interpreters #5947

XuehaiPan commented Dec 25, 2025

Uh oh!

Uh oh!

rwgk Dec 26, 2025

Uh oh!

rwgk commented Dec 26, 2025

Uh oh!

rwgk left a comment

Uh oh!

XuehaiPan commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix concurrency consistency for internals_pp_manager under multiple-interpreters #5947

Fix concurrency consistency for internals_pp_manager under multiple-interpreters #5947

Conversation

XuehaiPan commented Dec 25, 2025

Description

Suggested changelog entry:

Uh oh!

Uh oh!

rwgk Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk commented Dec 26, 2025

Review of PR #5947: Fix concurrency consistency for internals_pp_manager under multiple-interpreters

Summary

Core Problem

Key Changes

1. Move Interpreter Detection to Exec Phase (commit f7a0e04)

2. Semantic Change: Counter → Boolean Flag (commit a567962)

3. Re-enable Subinterpreter Support on Ubuntu 3.14 (commit 755839c)

4. Namespace Fix for Test Module (commit 8f29f8e)

5. Remove xfail Decorators from Tests

Windows CI Failure Analysis and Fix

The Problem

Root Cause

Why This Was Previously Hidden

The Fix (commit 3977e2d)

Additional Cleanup (commit 801d5f98)

Design Notes

Double ensure_internals() Calls

Thread Safety

Commits Added During Review

Conclusion

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

XuehaiPan commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix concurrency consistency for `internals_pp_manager` under multiple-interpreters #5947

Fix concurrency consistency for `internals_pp_manager` under multiple-interpreters #5947

Review of PR #5947: Fix concurrency consistency for `internals_pp_manager` under multiple-interpreters

1. Move Interpreter Detection to Exec Phase (commit `f7a0e04`)

2. Semantic Change: Counter → Boolean Flag (commit `a567962`)

3. Re-enable Subinterpreter Support on Ubuntu 3.14 (commit `755839c`)

4. Namespace Fix for Test Module (commit `8f29f8e`)

The Fix (commit `3977e2d`)

Double `ensure_internals()` Calls