Skip to content

feat[next]: GPU profiling #2508

Merged
egparedes merged 26 commits intoGridTools:mainfrom
egparedes:feature/gpu-traces
Apr 17, 2026
Merged

feat[next]: GPU profiling #2508
egparedes merged 26 commits intoGridTools:mainfrom
egparedes:feature/gpu-traces

Conversation

@egparedes
Copy link
Copy Markdown
Contributor

@egparedes egparedes commented Mar 3, 2026

Add GPU profiling utilities to gt4py.next so program and compiled-program calls can be annotated via cupyx.profiler (e.g., NVTX/ROCTX ranges) and profiling sessions can be started/stopped around a scope.

@egparedes egparedes marked this pull request as ready for review April 10, 2026 13:20
Copilot AI review requested due to automatic review settings April 10, 2026 13:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GPU profiling utilities to GT4Py Next so program and compiled-program calls can be annotated via cupyx.profiler (e.g., NVTX/ROCTX ranges) and profiling sessions can be started/stopped around a scope.

Changes:

  • Introduces gt4py.next.instrumentation.gpu_profiler with profile_calls() plus hook-based time-range wrappers for program/compiled-program dispatch.
  • Adds a CompiledProgramsPool.definition convenience property to support per-program customization (e.g., color_id).
  • Adds unit/integration tests and updates dependency groups (profiling, dev) for profiler-related tooling.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/gt4py/next/instrumentation/gpu_profiler.py New GPU profiling API + hook registration for time ranges and profiling session lifecycle.
src/gt4py/next/otf/compiled_program.py Adds definition property to compiled program pools for profiler metadata/customization.
src/gt4py/next/instrumentation/hook_machinery.py Small rename of __exit__ parameter for clarity (type_exc_type).
tests/next_tests/unit_tests/instrumentation_tests/test_gpu_profiler.py Unit tests for fallback path and profiler hook classes.
tests/next_tests/integration_tests/feature_tests/instrumentation_tests/test_gpu_profiler.py Integration tests for profiling call lifecycle (hook registration + ctx manager).
tests/next_tests/integration_tests/feature_tests/instrumentation_tests/test_hooks.py Runs hook integration test under gpu_profiler.profile_calls().
tests/next_tests/integration_tests/multi_feature_tests/ffront_tests/test_ffront_fvm_nabla.py Adds an atlas-based integration test demonstrating profiling usage with cupyx.profiler.
tests/next_tests/unit_tests/instrumentation_tests/test_metrics.py Adds from __future__ import annotations.
pyproject.toml Adds profiling dependency group and includes it in dev.
uv.lock Updates lockfile with new dependencies/groups.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
@egparedes egparedes requested review from Copilot and havogt April 10, 2026 14:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gt4py/next/instrumentation/gpu_profiler.py
Comment on lines +120 to +123
hooks.program_call_context.register(ProgramCallProfiler, index=0)
hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0)
_profile_ctx_manager = profile()
_profile_ctx_manager.__enter__()
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_profiling_calls() sets _profile_ctx_manager and registers hooks before calling __enter__(). If profile().__enter__() raises, the module can be left in a broken state (hooks registered, _profile_ctx_manager non-None) and subsequent calls won’t retry. Consider only assigning _profile_ctx_manager after a successful __enter__(), and unregistering hooks/resetting state on failure.

Suggested change
hooks.program_call_context.register(ProgramCallProfiler, index=0)
hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0)
_profile_ctx_manager = profile()
_profile_ctx_manager.__enter__()
ctx_manager = profile()
program_call_hook_registered = False
compiled_program_call_hook_registered = False
try:
hooks.program_call_context.register(ProgramCallProfiler, index=0)
program_call_hook_registered = True
hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0)
compiled_program_call_hook_registered = True
ctx_manager.__enter__()
except Exception:
if compiled_program_call_hook_registered:
hooks.compiled_program_call_context.remove(CompiledProgramCallProfiler)
if program_call_hook_registered:
hooks.program_call_context.remove(ProgramCallProfiler)
raise
_profile_ctx_manager = ctx_manager

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your conclusion was that we anyway can't recover from this situation, right? Because we don't capture any failure and it will lead to program exit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, that's what I had thought. Looking again at Copilot's suggestion, I think it makes sense to at least clean up the callback states in case the user catches the exception and continues with the execution. I've added a bit of cleanup code to handle this.

Comment thread src/gt4py/next/instrumentation/gpu_profiler.py
@egparedes egparedes requested a review from edopao April 13, 2026 15:19
Copy link
Copy Markdown
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py
Copy link
Copy Markdown
Contributor

@havogt havogt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, a question and a potential problem.

Comment on lines +89 to +92
with mock.patch.object(gpu_profiler, "profile", return_value=fake_profile):
gpu_profiler.start_profiling_calls()
gpu_profiler.start_profiling_calls()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about proper nesting of contexts? Wouldn't leaving the inner context already stop everything? From this test I understand that basically inner contexts should be ignored, right?

Copy link
Copy Markdown
Contributor Author

@egparedes egparedes Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile_calls() context is just a convenience util for calling start_profiling_calls() and stop_profiling_calls() automatically. This is a simple on/off switch, where the first call to either start_profiling_calls() or stop_profiling_calls() would start/stop the profiler. I don't think it's worth to add anything else more complex, at least for now. If the user needs something more complicated, he can still implement it's own mechanism, calling start_profiling_calls() and stop_profiling_calls() under the hood.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this again, the nesting behavior of the profile_calls() context handler is probably confusing. I guess we should either document it, make it raise at run-time when nested or just delete it and keep only the start_/stop_ calls. What do you think?

Comment on lines +120 to +123
hooks.program_call_context.register(ProgramCallProfiler, index=0)
hooks.compiled_program_call_context.register(CompiledProgramCallProfiler, index=0)
_profile_ctx_manager = profile()
_profile_ctx_manager.__enter__()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your conclusion was that we anyway can't recover from this situation, right? Because we don't capture any failure and it will lead to program exit?

@egparedes egparedes requested review from edopao and havogt April 15, 2026 09:57
@edopao
Copy link
Copy Markdown
Contributor

edopao commented Apr 15, 2026

LGTM

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
Copy link
Copy Markdown
Contributor

@havogt havogt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, in case you support the nesting, it's also approved.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import gt4py.next as gtx
from gt4py.next import common, Dims, gtfn_cpu, typing as gtx_typing
from gt4py.next.instrumentation import hooks
from gt4py.next.instrumentation import gpu_profiler, hooks
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu_profiler is imported but never used in this test module. This will fail unused-import linting if enabled for tests; please remove the import or use it in an assertion.

Suggested change
from gt4py.next.instrumentation import gpu_profiler, hooks
from gt4py.next.instrumentation import hooks

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +23
from gt4py._core import definitions as core_definitions
from gt4py.next.instrumentation import gpu_profiler, hooks


HAS_CUPY = core_definitions.CUPY_DEVICE_TYPE is not None


Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core_definitions/HAS_CUPY are declared but never used in this test module. If this is not intended for future skip logic, please remove them to avoid unused-import/unused-variable lint failures.

Suggested change
from gt4py._core import definitions as core_definitions
from gt4py.next.instrumentation import gpu_profiler, hooks
HAS_CUPY = core_definitions.CUPY_DEVICE_TYPE is not None
from gt4py.next.instrumentation import gpu_profiler, hooks

Copilot uses AI. Check for mistakes.
"""Context manager that enables GPU profiling of GT4Py program calls within its scope."""
if not start_profiling_calls():
warnings.warn(
"GPU profiling of GT4Py program calls is already active."
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message concatenates two string literals without whitespace, producing "...already active.Nested...". Please add an explicit space/newline so the emitted UserWarning is readable (and keep the message text stable for tests).

Suggested change
"GPU profiling of GT4Py program calls is already active."
"GPU profiling of GT4Py program calls is already active. "

Copilot uses AI. Check for mistakes.
Comment thread src/gt4py/next/instrumentation/gpu_profiler.py Outdated
@egparedes egparedes merged commit 75cdbdf into GridTools:main Apr 17, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants