Skip to content

Conversation

@XuehaiPan
Copy link
Contributor

@XuehaiPan XuehaiPan marked this pull request as ready for review December 25, 2025 07:22
@XuehaiPan XuehaiPan requested a review from henryiii as a code owner December 25, 2025 07:22
@XuehaiPan XuehaiPan changed the title Fix concurrency consistency for internals_pp_manager under multiple-interpreters [WIP] Fix concurrency consistency for internals_pp_manager under multiple-interpreters Dec 25, 2025
@b-pass b-pass self-requested a review December 25, 2025 21:41
b-pass and others added 9 commits December 25, 2025 17:10
The pre_init needs to check if it is in a subinterpreter or not. But in 3.13+ this static initializer runs in the main interpreter.  So we need to check this later, during the exec phase.
…was where it was...

Should not hurt anything to do it extra times here.
The count was not used, it was just checked for > 1, we now accomplish this by setting the flag.
b-pass and others added 5 commits December 25, 2025 23:13
On Windows with MSVC (multi-configuration generators), CMake uses
config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG when
set, otherwise falls back to LIBRARY_OUTPUT_DIRECTORY/<Config>/.

The main test modules (pybind11_tests, etc.) correctly set both
LIBRARY_OUTPUT_DIRECTORY and the config-specific variants (lines
517-528), so they output directly to tests/.

However, the mod_per_interpreter_gil* modules only copied the base
LIBRARY_OUTPUT_DIRECTORY property, causing them to be placed in
tests/Debug/ instead of tests/.

This mismatch caused test_import_in_subinterpreter_concurrently and
related tests to fail with ModuleNotFoundError on Windows Python 3.14,
because the test code sets sys.path based on pybind11_tests.__file__
(which is in tests/) but tries to import mod_per_interpreter_gil_with_singleton
(which ended up in tests/Debug/).

This bug was previously masked by @pytest.mark.xfail decorators on
these tests. Now that the underlying "Duplicate C++ type registration"
issue is fixed and the xfails are removed, this path issue surfaced.

The fix mirrors the same pattern used for main test targets: also set
LIBRARY_OUTPUT_DIRECTORY_<CONFIG> for each configuration type.
// upon success, the new interpreter is activated in this thread
result.istate_ = result.creation_tstate_->interp;
detail::get_num_interpreters_seen() += 1; // there are now many interpreters
detail::has_seen_non_main_interpreter() = true; // there are now many interpreters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to push this right now, to not restart the CI (half done), but to let you know about this asap:

$ git show
commit 801d5f98396753bdf25e9d87e747a7cf4edf5f3c (HEAD -> XuehaiPan→fix-multiple-interpreters-concurrency)
Author: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
Date:   Fri Dec 26 08:52:16 2025 -0800

    Remove outdated comment after function rename

    The comment "there are now many interpreters" was a holdover from when
    the function was named get_num_interpreters_seen(). With the rename to
    has_seen_non_main_interpreter(), the function name is now self-documenting.

diff --git a/include/pybind11/subinterpreter.h b/include/pybind11/subinterpreter.h
index e918b90c..c47787b6 100644
--- a/include/pybind11/subinterpreter.h
+++ b/include/pybind11/subinterpreter.h
@@ -109,7 +109,7 @@ public:

             // upon success, the new interpreter is activated in this thread
             result.istate_ = result.creation_tstate_->interp;
-            detail::has_seen_non_main_interpreter() = true; // there are now many interpreters
+            detail::has_seen_non_main_interpreter() = true;
             detail::get_internals(); // initialize internals.tstate, amongst other things...

             // In 3.13+ this state should be deleted right away, and the memory will be reused for

@rwgk
Copy link
Collaborator

rwgk commented Dec 26, 2025

Review of PR #5947: Fix concurrency consistency for internals_pp_manager under multiple-interpreters

PR: #5947
Reviewed by: rwgk (with Cursor assistance)
Date: 2025-12-26


Summary

This PR fixes issues with subinterpreter support, particularly the "subinterpreter_before_main" failure that occurred when a pybind11 module was first loaded in a subinterpreter before being loaded in the main interpreter. The fix enables removal of @pytest.mark.xfail decorators from several multiple-interpreter tests.


Core Problem

In Python 3.13+, the module loading process changed to have two distinct phases:

  1. Static init phase (PyInit_*) - runs once per process, always in the main interpreter
  2. Exec phase (Py_mod_exec slot) - runs per interpreter

The original pybind11 code detected subinterpreters during the static init phase:

#define PYBIND11_MODULE(name, variable, ...)                                      \
    PYBIND11_MODULE_PYINIT(                                                       \
        name, (pybind11::detail::get_num_interpreters_seen() += 1), ##__VA_ARGS__) \
    ...

This was incorrect for Python 3.13+ because the static init phase always runs in the main interpreter, even when loading a module from a subinterpreter. This caused the "subinterpreter_before_main" test to fail.


Key Changes

1. Move Interpreter Detection to Exec Phase (commit f7a0e04)

The crux of the fix. The pre_init code is removed from the static init phase, and ensure_internals() is now called in the exec phase where the correct interpreter context is available:

int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) {
    try {
        pybind11::detail::ensure_internals();  // Now checks in the correct interpreter
        ...

The new ensure_internals() function properly detects non-main interpreters:

inline void ensure_internals() {
    pybind11::detail::get_internals_pp_manager().unref();
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
    if (PyInterpreterState_Get() != PyInterpreterState_Main()) {
        has_seen_non_main_interpreter() = true;
    }
#endif
    pybind11::detail::get_internals();
}

2. Semantic Change: Counter → Boolean Flag (commit a567962)

Changed from get_num_interpreters_seen() > 1 to has_seen_non_main_interpreter().

This is not just a cleanup - it's a semantic fix:

Scenario Old: counter > 1 New: has_seen_non_main_interpreter()
Module first loaded in subinterpreter counter = 1, check fails ❌ Flag set to true ✓
Module first loaded in main, then subinterpreter counter = 2, check passes ✓ Flag set to true ✓

The old counter-based approach only worked correctly when the main interpreter loaded the module first.

3. Re-enable Subinterpreter Support on Ubuntu 3.14 (commit 755839c)

Removed the workaround -DPYBIND11_HAS_SUBINTERPRETER_SUPPORT=0 from ci.yml that was added in commit e4873e8 as a temporary fix. Now that the core issue is resolved, subinterpreter support can be enabled.

4. Namespace Fix for Test Module (commit 8f29f8e)

Added proper namespace to mod_per_interpreter_gil_with_singleton.cpp to avoid libc++ issues with anonymous namespaces (as noted by @rwgk in PR comments referencing issue #4319).

5. Remove xfail Decorators from Tests

The following tests no longer need @pytest.mark.xfail:

  • test_import_module_with_singleton_per_interpreter
  • test_import_in_subinterpreter_after_main
  • test_import_in_subinterpreter_before_main
  • test_import_in_subinterpreter_concurrently

Windows CI Failure Analysis and Fix

The Problem

After pushing the PR changes, Windows Python 3.14 CI jobs failed with:

ModuleNotFoundError: No module named 'mod_per_interpreter_gil_with_singleton'

This was puzzling because the module was successfully built.

Root Cause

On Windows with MSVC (multi-configuration generators), CMake uses config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG. The main test modules correctly set both LIBRARY_OUTPUT_DIRECTORY and the config-specific variants:

# Lines 517-528 in tests/CMakeLists.txt - for main test modules
if(NOT CMAKE_LIBRARY_OUTPUT_DIRECTORY)
  set_target_properties(${target} PROPERTIES LIBRARY_OUTPUT_DIRECTORY
                                             "${CMAKE_CURRENT_BINARY_DIR}")
  if(DEFINED CMAKE_CONFIGURATION_TYPES)
    foreach(config ${CMAKE_CONFIGURATION_TYPES})
      string(TOUPPER ${config} config)
      set_target_properties(${target} PROPERTIES LIBRARY_OUTPUT_DIRECTORY_${config}
                                                 "${CMAKE_CURRENT_BINARY_DIR}")
    endforeach()
  endif()
endif()

However, the mod_per_interpreter_gil* modules only copied the base property:

# Lines 590-594 - BEFORE fix
get_target_property(pybind11_tests_output_directory pybind11_tests LIBRARY_OUTPUT_DIRECTORY)
foreach(mod IN LISTS PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES)
  set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY
                                            "${pybind11_tests_output_directory}")
endforeach()

Result:

  • pybind11_teststests/ (has LIBRARY_OUTPUT_DIRECTORY_DEBUG set)
  • mod_per_interpreter_gil*tests/Debug/ (missing config-specific property)

The test code sets sys.path based on pybind11_tests.__file__ location (tests/), but the module was in tests/Debug/.

Why This Was Previously Hidden

The @pytest.mark.xfail decorators masked this bug. The tests were already expected to fail (for the "Duplicate C++ type registration" reason), so the Windows path issue was never noticed.

The Fix (commit 3977e2d)

get_target_property(pybind11_tests_output_directory pybind11_tests LIBRARY_OUTPUT_DIRECTORY)
foreach(mod IN LISTS PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES)
  set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY
                                            "${pybind11_tests_output_directory}")
  # Also set config-specific output directories for multi-configuration generators (MSVC)
  if(DEFINED CMAKE_CONFIGURATION_TYPES)
    foreach(config ${CMAKE_CONFIGURATION_TYPES})
      string(TOUPPER ${config} config)
      set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY_${config}
                                                "${pybind11_tests_output_directory}")
    endforeach()
  endif()
endforeach()

Additional Cleanup (commit 801d5f98)

Removed an outdated comment in subinterpreter.h:

// Before:
detail::has_seen_non_main_interpreter() = true; // there are now many interpreters

// After:
detail::has_seen_non_main_interpreter() = true;

The comment was a holdover from when the function was named get_num_interpreters_seen(). The new function name is self-documenting.


Design Notes

Double ensure_internals() Calls

The code calls ensure_internals() in both:

  1. PYBIND11_PLUGIN_IMPL (static init phase)
  2. pybind11_exec_* (exec phase)

This is intentional (see commit 3b54dcf):

  • For Python < 3.13: Static init can run in subinterpreters, so both calls are useful
  • For Python >= 3.13: Static init always runs in main interpreter, so only exec phase call matters
  • Calling it twice is harmless (idempotent operations)

Thread Safety

has_seen_non_main_interpreter() uses std::atomic_bool, ensuring thread-safe access when multiple interpreters may be running concurrently.


Commits Added During Review

  1. 3977e2d - Fix mod_per_interpreter_gil* output directory on Windows/MSVC
  2. 801d5f98 - Remove outdated comment after function rename

Conclusion

The PR correctly fixes the subinterpreter timing issue in Python 3.13+ and properly handles the "subinterpreter_before_main" scenario. The Windows CI fix we added ensures the tests can find the built modules. The changes are well-structured and the xfails are appropriately removed.

Recommendation: Ready to merge once CI is green.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know any corrections to the review comment I posted a minute ago.

It's way past bedtime where I am at the moment: assuming the CI is green now, could you please remove the outdated comment, and then go ahead merge this change? (b-pass can merge)

@XuehaiPan
Copy link
Contributor Author

The current state looks good to me. All CI checks are green now.

FYI, my downstream PR using this patch now works without segfaults or internals consistency issues:

The only unexpected output is here: https://github.com/metaopt/optree/actions/runs/20518681667/job/58950377906?pr=245

class PyBindCppIter {
public:
    explicit PyBindCppIter(const py::object &obj) : m_obj{obj} {}
    PyBindCppIter() = delete;
    ~PyBindCppIter() = default;
    PyBindCppIter(const PyBindCppIter &) = delete;
    PyBindCppIter &operator=(const PyBindCppIter &) = delete;
    PyBindCppIter(PyBindCppIter &&) = delete;
    PyBindCppIter &operator=(PyBindCppIter &&) = delete;

    PyBindCppIter &iter() noexcept { return *this; }
    py::object next() {
#if defined(Py_GIL_DISABLED)
        std::scoped_lock lock(m_mutex);
#endif
        return m_obj.attr("__next__")();
    }

private:
    py::object m_obj;
#if defined(Py_GIL_DISABLED)
    mutable std::mutex m_mutex{};
#endif
};
import itertools
from concurrent.futures import ThreadPoolExecutor

# Initialize the iterator
it = iter(PyBindCppIter(range(N)))  # expected to be a sorted thread-safe iterator

# Each thread consumes the same iterator and collects results
with ThreadPoolExecutor(max_workers=num_workers) as executor:
    # Note: map(list, [it] * num_workers) will cause multiple threads to call
    # next(it) on the same pybind object's `__next__` concurrently.
    sequences = list(executor.map(list, [it] * num_workers))

# Verification 1: Integrity
# The combined sequences should cover the entire range and be mutually exclusive.
# PASS:
#   - Python 3.14t (only seen the main interpreter)
#   - Python 3.14t (seen the main interpreter and sub-interpreters
#   - Python 3.14 (only seen the main interpreter)
#   - Python 3.14 (seen the main interpreter and sub-interpreters)
assert sorted(itertools.chain.from_iterable(sequences)) == list(range(N))

# Verification 2: Monotonicity (Order)
# The subsequences should be also sorted individually.
for seq in sequences:
    # PASS:
    #   - Python 3.14t (only seen the main interpreter)
    #   - Python 3.14t (seen the main interpreter and sub-interpreters)
    #   - Python 3.14 (only seen the main interpreter)
    # FAIL:
    #   - Python 3.14 (seen the main interpreter and sub-interpreters)
    assert seq == sorted(seq), f'Subsequence is not sorted: {seq}'

I'm not entirely sure if this stems from the pybind11 call policy mechanism or a bug within CPython itself.

Test Results:

  • Python 3.14t tests: ALL PASS
  • Python 3.14 (without changing detail::has_seen_non_main_interpreter() flag): PASS (using pytest -k 'not subinterpreter')
  • Python 3.14 (running both threading (in the main interpreter only) and subinterpreter tests): Fails with function output consistency issues.

I will investigate this further on my end; it should not be a blocker for merging this PR.

@b-pass b-pass changed the title [WIP] Fix concurrency consistency for internals_pp_manager under multiple-interpreters Fix concurrency consistency for internals_pp_manager under multiple-interpreters Dec 26, 2025
@b-pass b-pass merged commit fee2527 into pybind:master Dec 26, 2025
84 of 87 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs changelog Possibly needs a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants