fix: Clean shutdown for Sink threaded server using threading.Event#325
fix: Clean shutdown for Sink threaded server using threading.Event#325
Conversation
Signed-off-by: Sreekanth <prsreekanth920@gmail.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #325 +/- ##
==========================================
- Coverage 94.46% 94.24% -0.23%
==========================================
Files 66 66
Lines 3071 3092 +21
Branches 158 162 +4
==========================================
+ Hits 2901 2914 +13
- Misses 141 148 +7
- Partials 29 30 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Sreekanth <prsreekanth920@gmail.com>
Signed-off-by: Sreekanth <prsreekanth920@gmail.com>
Signed-off-by: Sreekanth <prsreekanth920@gmail.com>
Signed-off-by: Sreekanth <prsreekanth920@gmail.com>
kohlisid
left a comment
There was a problem hiding this comment.
lgtm~
In the tests just to confirm whats the behavior of the inflight messages on server.stop? Ideally we are sending EOF, so that should allow the other messages to get processed before stopping
|
Also @BulkBeing could you check about coverage here? |
During a normal shutdown (no UDF exception, like pod deletion), the server waits for in-flight requests to complete. In the case of shutdown due to UDF exception, we already sent a internal server error in |
Signed-off-by: Sreekanth <prsreekanth920@gmail.com>



Similar to #323, but for threaded server
Also
For this UDF:
UDF logs:
Numa:
{"timestamp":"2026-03-03T15:04:10.356463Z","level":"ERROR","message":"Error while writing messages","e":"Grpc(Status { code: Internal, message: \"UDSinkError, UDF_EXECUTION_ERROR(udsink): Exception('30 seconds elapsed')\", details: b\"\\x08\\r\\x12IUDSinkError, UDF_EXECUTION_ERROR(udsink): Exception('30 seconds elapsed')\\x1a\\xc7\\x08\\n(type.googleapis.com/google.rpc.DebugInfo\\x12\\x9a\\x08\\x12\\x97\\x08Traceback (most recent call last):\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/sinker/servicer/sync_servicer.py\\\", line 72, in SinkFn\\n ret = cur_task.join()\\n ^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/shared/thread_with_return.py\\\", line 63, in join\\n raise self._exception\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/shared/thread_with_return.py\\\", line 40, in run\\n self._return = self._target(*self._args, **self._kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/sinker/servicer/sync_servicer.py\\\", line 109, in _invoke_sink\\n rspns = self.handler(request_queue.read_iterator())\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/example.py\\\", line 17, in udsink_handler\\n raise Exception(\\\"30 seconds elapsed\\\")\\nException: 30 seconds elapsed\", source: None })","target":"numaflow_core::pipeline::forwarder::sink_forwarder"} {"timestamp":"2026-03-03T15:04:10.356555Z","level":"INFO","message":"Forwarder task completed","result":"Err(Grpc(Status { code: Internal, message: \"UDSinkError, UDF_EXECUTION_ERROR(udsink): Exception('30 seconds elapsed')\", details: b\"\\x08\\r\\x12IUDSinkError, UDF_EXECUTION_ERROR(udsink): Exception('30 seconds elapsed')\\x1a\\xc7\\x08\\n(type.googleapis.com/google.rpc.DebugInfo\\x12\\x9a\\x08\\x12\\x97\\x08Traceback (most recent call last):\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/sinker/servicer/sync_servicer.py\\\", line 72, in SinkFn\\n ret = cur_task.join()\\n ^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/shared/thread_with_return.py\\\", line 63, in join\\n raise self._exception\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/shared/thread_with_return.py\\\", line 40, in run\\n self._return = self._target(*self._args, **self._kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/sinker/servicer/sync_servicer.py\\\", line 109, in _invoke_sink\\n rspns = self.handler(request_queue.read_iterator())\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/example.py\\\", line 17, in udsink_handler\\n raise Exception(\\\"30 seconds elapsed\\\")\\nException: 30 seconds elapsed\", source: None }))","target":"numaflow_core::pipeline::forwarder::sink_forwarder"} {"timestamp":"2026-03-03T15:04:10.356616Z","level":"INFO","message":"Stopped the Lag-Reader Expose tasks","target":"numaflow_core::metrics"} {"timestamp":"2026-03-03T15:04:10.356642Z","level":"ERROR","message":"Pipeline failed because of UDF failure","error":"Status { code: Internal, message: \"UDSinkError, UDF_EXECUTION_ERROR(udsink): Exception('30 seconds elapsed')\", details: b\"\\x08\\r\\x12IUDSinkError, UDF_EXECUTION_ERROR(udsink): Exception('30 seconds elapsed')\\x1a\\xc7\\x08\\n(type.googleapis.com/google.rpc.DebugInfo\\x12\\x9a\\x08\\x12\\x97\\x08Traceback (most recent call last):\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/sinker/servicer/sync_servicer.py\\\", line 72, in SinkFn\\n ret = cur_task.join()\\n ^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/shared/thread_with_return.py\\\", line 63, in join\\n raise self._exception\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/shared/thread_with_return.py\\\", line 40, in run\\n self._return = self._target(*self._args, **self._kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/.venv/lib/python3.11/site-packages/pynumaflow/sinker/servicer/sync_servicer.py\\\", line 109, in _invoke_sink\\n rspns = self.handler(request_queue.read_iterator())\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/pysetup/examples/sink/log/example.py\\\", line 17, in udsink_handler\\n raise Exception(\\\"30 seconds elapsed\\\")\\nException: 30 seconds elapsed\", source: None }","target":"numaflow_core"} {"timestamp":"2026-03-03T15:04:10.357110Z","level":"INFO","message":"Gracefully Exiting...","target":"numaflow_core"} {"timestamp":"2026-03-03T15:04:10.357184Z","level":"INFO","message":"Exited.","target":"numaflow"}On regular pod delete (stream close), before the changes in this PR:
with the changes: