feat[next]: GPU profiling #2508
Conversation
There was a problem hiding this comment.
Pull request overview
Adds GPU profiling utilities to GT4Py Next so program and compiled-program calls can be annotated via cupyx.profiler (e.g., NVTX/ROCTX ranges) and profiling sessions can be started/stopped around a scope.
Changes:
- Introduces
gt4py.next.instrumentation.gpu_profilerwithprofile_calls()plus hook-based time-range wrappers for program/compiled-program dispatch. - Adds a
CompiledProgramsPool.definitionconvenience property to support per-program customization (e.g.,color_id). - Adds unit/integration tests and updates dependency groups (
profiling,dev) for profiler-related tooling.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/gt4py/next/instrumentation/gpu_profiler.py |
New GPU profiling API + hook registration for time ranges and profiling session lifecycle. |
src/gt4py/next/otf/compiled_program.py |
Adds definition property to compiled program pools for profiler metadata/customization. |
src/gt4py/next/instrumentation/hook_machinery.py |
Small rename of __exit__ parameter for clarity (type_ → exc_type). |
tests/next_tests/unit_tests/instrumentation_tests/test_gpu_profiler.py |
Unit tests for fallback path and profiler hook classes. |
tests/next_tests/integration_tests/feature_tests/instrumentation_tests/test_gpu_profiler.py |
Integration tests for profiling call lifecycle (hook registration + ctx manager). |
tests/next_tests/integration_tests/feature_tests/instrumentation_tests/test_hooks.py |
Runs hook integration test under gpu_profiler.profile_calls(). |
tests/next_tests/integration_tests/multi_feature_tests/ffront_tests/test_ffront_fvm_nabla.py |
Adds an atlas-based integration test demonstrating profiling usage with cupyx.profiler. |
tests/next_tests/unit_tests/instrumentation_tests/test_metrics.py |
Adds from __future__ import annotations. |
pyproject.toml |
Adds profiling dependency group and includes it in dev. |
uv.lock |
Updates lockfile with new dependencies/groups. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| hooks.program_call_context.register(ProgramCallProfiler, index=0) | ||
| hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0) | ||
| _profile_ctx_manager = profile() | ||
| _profile_ctx_manager.__enter__() |
There was a problem hiding this comment.
start_profiling_calls() sets _profile_ctx_manager and registers hooks before calling __enter__(). If profile().__enter__() raises, the module can be left in a broken state (hooks registered, _profile_ctx_manager non-None) and subsequent calls won’t retry. Consider only assigning _profile_ctx_manager after a successful __enter__(), and unregistering hooks/resetting state on failure.
| hooks.program_call_context.register(ProgramCallProfiler, index=0) | |
| hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0) | |
| _profile_ctx_manager = profile() | |
| _profile_ctx_manager.__enter__() | |
| ctx_manager = profile() | |
| program_call_hook_registered = False | |
| compiled_program_call_hook_registered = False | |
| try: | |
| hooks.program_call_context.register(ProgramCallProfiler, index=0) | |
| program_call_hook_registered = True | |
| hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0) | |
| compiled_program_call_hook_registered = True | |
| ctx_manager.__enter__() | |
| except Exception: | |
| if compiled_program_call_hook_registered: | |
| hooks.compiled_program_call_context.remove(CompiledProgramCallProfiler) | |
| if program_call_hook_registered: | |
| hooks.program_call_context.remove(ProgramCallProfiler) | |
| raise | |
| _profile_ctx_manager = ctx_manager |
There was a problem hiding this comment.
Your conclusion was that we anyway can't recover from this situation, right? Because we don't capture any failure and it will lead to program exit?
There was a problem hiding this comment.
Yes, exactly, that's what I had thought. Looking again at Copilot's suggestion, I think it makes sense to at least clean up the callback states in case the user catches the exception and continues with the execution. I've added a bit of cleanup code to handle this.
havogt
left a comment
There was a problem hiding this comment.
lgtm, a question and a potential problem.
| with mock.patch.object(gpu_profiler, "profile", return_value=fake_profile): | ||
| gpu_profiler.start_profiling_calls() | ||
| gpu_profiler.start_profiling_calls() | ||
|
|
There was a problem hiding this comment.
What about proper nesting of contexts? Wouldn't leaving the inner context already stop everything? From this test I understand that basically inner contexts should be ignored, right?
There was a problem hiding this comment.
The profile_calls() context is just a convenience util for calling start_profiling_calls() and stop_profiling_calls() automatically. This is a simple on/off switch, where the first call to either start_profiling_calls() or stop_profiling_calls() would start/stop the profiler. I don't think it's worth to add anything else more complex, at least for now. If the user needs something more complicated, he can still implement it's own mechanism, calling start_profiling_calls() and stop_profiling_calls() under the hood.
There was a problem hiding this comment.
Thinking about this again, the nesting behavior of the profile_calls() context handler is probably confusing. I guess we should either document it, make it raise at run-time when nested or just delete it and keep only the start_/stop_ calls. What do you think?
| hooks.program_call_context.register(ProgramCallProfiler, index=0) | ||
| hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0) | ||
| _profile_ctx_manager = profile() | ||
| _profile_ctx_manager.__enter__() |
There was a problem hiding this comment.
Your conclusion was that we anyway can't recover from this situation, right? Because we don't capture any failure and it will lead to program exit?
|
LGTM |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
havogt
left a comment
There was a problem hiding this comment.
lgtm, in case you support the nesting, it's also approved.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import gt4py.next as gtx | ||
| from gt4py.next import common, Dims, gtfn_cpu, typing as gtx_typing | ||
| from gt4py.next.instrumentation import hooks | ||
| from gt4py.next.instrumentation import gpu_profiler, hooks |
There was a problem hiding this comment.
gpu_profiler is imported but never used in this test module. This will fail unused-import linting if enabled for tests; please remove the import or use it in an assertion.
| from gt4py.next.instrumentation import gpu_profiler, hooks | |
| from gt4py.next.instrumentation import hooks |
| from gt4py._core import definitions as core_definitions | ||
| from gt4py.next.instrumentation import gpu_profiler, hooks | ||
|
|
||
|
|
||
| HAS_CUPY = core_definitions.CUPY_DEVICE_TYPE is not None | ||
|
|
||
|
|
There was a problem hiding this comment.
core_definitions/HAS_CUPY are declared but never used in this test module. If this is not intended for future skip logic, please remove them to avoid unused-import/unused-variable lint failures.
| from gt4py._core import definitions as core_definitions | |
| from gt4py.next.instrumentation import gpu_profiler, hooks | |
| HAS_CUPY = core_definitions.CUPY_DEVICE_TYPE is not None | |
| from gt4py.next.instrumentation import gpu_profiler, hooks |
| """Context manager that enables GPU profiling of GT4Py program calls within its scope.""" | ||
| if not start_profiling_calls(): | ||
| warnings.warn( | ||
| "GPU profiling of GT4Py program calls is already active." |
There was a problem hiding this comment.
The warning message concatenates two string literals without whitespace, producing "...already active.Nested...". Please add an explicit space/newline so the emitted UserWarning is readable (and keep the message text stable for tests).
| "GPU profiling of GT4Py program calls is already active." | |
| "GPU profiling of GT4Py program calls is already active. " |
Add GPU profiling utilities to
gt4py.nextso program and compiled-program calls can be annotated viacupyx.profiler(e.g., NVTX/ROCTX ranges) and profiling sessions can be started/stopped around a scope.