Fix integration test worker crashes in Azure Functions on Py3.13#4260
Fix integration test worker crashes in Azure Functions on Py3.13#4260
Conversation
Three changes to prevent pytest-xdist workers from crashing during Azure Functions integration tests: 1. Add `start_new_session=True` to subprocess on Linux so signals (e.g. from test-timeout) cannot propagate between the func host and the xdist worker process. 2. Add an overall 100-second budget to the fixture setup loop so the retry logic never exceeds the 120-second test timeout. When pytest-timeout's thread method fires during fixture setup and the thread doesn't respond, it calls os._exit() which kills the xdist worker – this is the root cause of the "Not properly terminated" crashes. 3. Remove the `UV_PYTHON: "3.10"` workaround from both workflow files so integration tests actually run on Python 3.13. Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes pytest-xdist worker crashes in Azure Functions integration tests on Python 3.13 by preventing fixture startup from exceeding the pytest-timeout budget and isolating the func start subprocess from the worker’s process group. Also removes a stale workflow-level Python 3.10 override so CI actually runs these tests on 3.13.
Changes:
- Isolate the Azure Functions host subprocess on POSIX using
start_new_session=Trueto prevent signal cross-contamination. - Add an overall startup time budget in the
function_app_for_testfixture to ensure clean failure (pytest.fail) before pytest-timeout can hard-exit the worker. - Remove
UV_PYTHON: "3.10"overrides from the Functions integration test jobs so they inherit the workflow’s Python version (3.13).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/packages/azurefunctions/tests/integration_tests/conftest.py |
Adds POSIX process-group isolation for func start and enforces an overall fixture startup budget to avoid pytest-timeout killing xdist workers. |
.github/workflows/python-merge-tests.yml |
Removes the job-level UV_PYTHON pin so merge tests run Functions integration on the workflow’s Python version. |
.github/workflows/python-integration-tests.yml |
Removes the job-level UV_PYTHON pin so integration tests run Functions integration on the workflow’s Python version. |
python/packages/azurefunctions/tests/integration_tests/conftest.py
Outdated
Show resolved
Hide resolved
|
@copilot code review[agent] fix the comments and failing jobs |
Addresses review feedback: monotonic clock is immune to NTP/clock adjustments that could skew the budget enforcement. Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
|
@copilot I ran the integrations test CI for validating the fix and it looks like its still failing - https://github.com/microsoft/agent-framework/actions/runs/22430115474/job/64947032165 |
…on 3.12 The Azure Functions Python worker crashes with SIGSEGV (exit code 139) on Python 3.13 due to protobuf C extension (google._upb) compatibility issues. When the test runner uses Python >=3.13, the conftest now automatically finds a compatible Python 3.10-3.12 and sets languageWorkers__python__defaultExecutablePath so the func host uses it for the worker process. The CI setup action also ensures Python 3.12 is available on the runner, falling back to uv python install if the system doesn't have it. Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
…onfig key format Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
Investigated the failing job. The root cause is different from the original xdist crash — the Azure Functions Python worker segfaults (exit code 139) on Python 3.13 due to the protobuf C extension (
|
Motivation and Context
All 20 pytest-xdist workers crash with
[gwN] node down: Not properly terminatedduring Azure Functions integration tests on Python 3.13. Thepython-tests-functionsjob had aUV_PYTHON: "3.10"override masking this, but the parent workflow setsUV_PYTHON: "3.13", so the override was silently ignored.Two root causes were identified:
xdist worker crash: The
function_app_for_testfixture retry loop can spend up to ~184s (3 × 60s wait + cleanup), exceeding the 120s--timeout. Whenpytest-timeout's thread method fires mid-fixture and the thread is blocked, it callsos._exit()— killing the xdist worker outright. Compounding this, thefuncsubprocess shares the worker's process group, so signals propagate bidirectionally.Azure Functions worker segfault: The Azure Functions Python worker crashes with SIGSEGV (exit code 139) on Python 3.13 due to the protobuf C extension (
google._upb) failing during the worker's module isolation cleanup. This prevents the function app from starting at all on Python 3.13.Description
conftest.py— subprocess isolationAdded
start_new_session=Trueon Linux so thefunc startprocess runs in its own process group. Prevents signal cross-contamination between pytest-timeout and the function host.conftest.py— fixture timeout budgetAdded a 100s overall budget (under the 120s test timeout) that caps each retry's
max_waitto the remaining time. The fixture now always exits cleanly viapytest.fail()instead of being killed byos._exit(). Usestime.monotonic()for budget tracking so NTP/clock adjustments cannot skew the enforcement.conftest.py— worker Python redirect for Python ≥3.13Added
_find_func_worker_python()that auto-detects when the test runner is on Python ≥3.13 and finds a compatible Python 3.10–3.12 executable (checkingFUNC_WORKER_PYTHONenv var, thenpython3.12/python3.11/python3.10viashutil.which). SetslanguageWorkers__python__defaultExecutablePathin the func process env so the Azure Functions host uses the compatible Python for its worker while pytest continues running on 3.13.azure-functions-integration-setupaction — ensure Python 3.12 availabilityAdded a step to the CI setup action that ensures Python 3.12 is available on the runner. If the system doesn't have
python3.12, it installs it viauv python install 3.12and exports the path asFUNC_WORKER_PYTHONfor the conftest to use.Workflow files — remove stale Python 3.10 pin
Removed
UV_PYTHON: "3.10"frompython-tests-functionsin bothpython-merge-tests.ymlandpython-integration-tests.yml. The job now inherits the workflow-levelUV_PYTHON: "3.13".Contribution Checklist
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.