FIX: Update hash tables of affected profiler instances when rewriting code objects #351
FIX: Update hash tables of affected profiler instances when rewriting code objects #351Erotemic merged 11 commits intopyutils:mainfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #351 +/- ##
=======================================
Coverage 87.36% 87.36%
=======================================
Files 18 18
Lines 1630 1630
Branches 347 347
=======================================
Hits 1424 1424
Misses 151 151
Partials 55 55 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
line_profiler/_line_profiler.pyx::LineProfiler.get_stats()
Updated/simplified implementation to aggregate profiling data
between code objects with duplicate hashes
line_profiler/_line_profiler.pyx::LineProfiler
_all_instances_by_bcodes
New private class attribute for keeping track of:
- Instances profiling the same bytecode
- Count for how the the bytecode should be padded
dupes_map
Now containing the up-to-date (instead of pre-padding) code objs
add_function()
Refactored implementation to:
- Pad non-Cython bytecodes according to
`._all_instances_by_bcodes` instead of the instance's
`.dupes_map`
- Use `.dupes_map` to identify profiler instances profiling the
same functions and update their hash tables
- Curate `.dupes_map` on said instances to stay up-to-date
line_profiler/_line_profiler.pyx::LineProfiler
add_function()
- Now identifying update targets based on function-object
identity instead of code-object identity (because different
function objects can share the same code object)
- Simplified implementation
_all_instances_by_bcodes
Superseded by other private class attributes:
- `._all_paddings`:
mapping from bytecodes to padding length
- `._all_instances_by_funcs`:
mapping from function identity to `WeakSet`s of instances
tests/test_line_profiler.py
test_multiple_profilers_identical_bytecode()
New test for applying multiple profilers to identical functions
in arbitrary order (one subtest currently failing)
line_profiler/_line_profiler.pyx::LineProfiler.add_function()
Now stripping padded code objects and re-padding when appropriate,
so that no two profiled code objects can end up with the same
bytecode
tests/test_line_profiler.py
test_multiple_profilers_identical_bytecode()
- Updated comments and docstring
- Added check against duplicate bytecodes
tests/test_line_profiler.py
get_prof_stats()
New helper function for formatting and retrieving
`LineProfiler.print_stats()` output
test_*_decorator()
test_profile_generated_code()
test_multiple_profilers_identical_bytecode()
Refactored to use `get_prof_stats()`
test_aggregate_profiling_data_between_code_versions()
New test checking that profiling data gathered before and after
a function's code object is rewritten are correctly aggregated
781c01e to
af098a2
Compare
Erotemic
left a comment
There was a problem hiding this comment.
For add_function we are assuming that the function actually hasn't been added before, so if we see the same bytes it must be a new instance of that function rather than the actual same function added more than once?
line_profiler/_line_profiler.pyx
Outdated
| base_co_code: bytes = co_code | ||
| # Figure out how much padding we need and strip the bytecode | ||
| # TODO: is there a way to do this faster? `.rstrip()` doesn't | ||
| # work (reliably) since `NOP_BYTES` should be 2-byte wide |
There was a problem hiding this comment.
The only alternative I can think of is a regex, but this logic is likely much faster (especially if it gets optimized into C). You could write it as a separate cdef function because it is fairly self-contained behavior. Maybe multibyte_rstrip?
There was a problem hiding this comment.
Looking at the genned .cpp code:
- We grab
NOP_BYTESfrom the global scope, and do aPyObject_Length()to getnop_len. This may be optimized a bit by placinglen(NOP_BYTES)itself in the global namespace I guess; or maybe we can just hard-code to 2 seeing that we doNOP_BYTES: bytes = NOP_VALUE.to_bytes(2, byteorder=byteorder)
- In the loop, we again grab
NOP_BYTESfrom the global scope, then call__Pyx_ByBytesTailmatch()withbase_co_codeand that. Supposing that local variables are relatively cheap, we should maybe at the very least pre-fetchNOP_BYTESto prevent calling__Pyx_GetModuleGlobalName()in the loop. base_co_codeis truncated withPySequence_GetSlice(); this is a bit surprising to me since I kinda expected a specializedPyBytesC function.npad_codeis incremented with__Pyx_PyLong_AddObjC().
Since the code is rather simple and yet we want both base_co_code and npad_code, multibyte_rstrip() (which I supposed should be inline) will have to return a tuple which is to be unpacked in add_function(). Said unpacking may have its own performance implications, so I'm not really sure whether it should be refactored out, or stay inlined-inlined as-is.
But realistically speaking this is probably not a bottleneck either way, and if it was, refactoring it out as you suggested will give us more freedom to optimize it. Will play around with this a bit more and see what works best.
line_profiler/_line_profiler.pyx
Outdated
| npad = 0 | ||
| self._all_paddings[base_co_code] = max(npad, npad_code) + 1 | ||
| try: | ||
| neighbors = self._all_instances_by_funcs[func_id] |
There was a problem hiding this comment.
neighbors implies some sort of geometric proximity to me. I guess you are building a graph of some sort. But this is a mapping from each function id to the set of profiler objects that contain it. Maybe profilers is a more descriptive name? Or maybe call it profilers_to_update, as that looks like what these will be used for?
There was a problem hiding this comment.
I think I actually had profilers_to_update (or was it instances~?) at some point. Will roll back for clarity.
Yes, because |
|
I think this looks good in general. I'll let you make any cleanups, and tell me when you are ready to merge. |
|
Thanks for the review! I've implemented the changes we discussed above, and the code is otherwise unchanged. Should be ready for the merge once CI passes. |
Closes #350. Includes #349, which by extension closes #348. New changes are from 7338799 onwards.
Code changes
line_profiler/_line_profiler.pyx::LineProfiler.add_function().dupes_mapitems for padded bytecodes._all_paddings,._all_instances_by_funcsNew private class attributes for use by
.add_function()Test changes
tests/test_line_profiler.pytest_{{class,static,bound,partial}method,partial,[cached_]property}_decorator(),test_profile_generated_code()Minor refactoring with a utility function for more readable output
test_multiple_profilers_identical_bytecode()New test checking that the order in which profilers are used to decorate functions shouldn't affect the output (up to overhead)
test_aggregate_profiling_data_between_code_versions()New test checking that existing profiling data should persist and be aggregated with new data after another profiler replaced the code object of a profiled function
Performance impact1
Test suite run 5 times with the command
pytest 'not (test_duplicate or test_aggregate or identical_bytecode)', i.e. excluding the tests added by this PR (and #349); impact on performance should be minimal.mainmulti-desync-fix(this PR)Footnotes
(Update 8 Jul) Outdated data from before ENH: more intuitive profiling-target selection #337, FIX: restore Cython compatibility + pivot to
sys.monitoring#352, etc. were merged. Since the changes are largely independent they are probably still indicative of the (non-)impact this has on performance. ↩