fix: osw export bug and memory view bug#200
Conversation
…ing methods - Introduced `_to_writable_c_array` to create writable, C-contiguous numpy arrays for Cython memoryviews. - Updated `Experiment` class methods to use the new utility function for handling `tg_num_id` and score arrays. - Enhanced tests to verify the functionality of `_to_writable_c_array` with read-only input.
There was a problem hiding this comment.
Pull request overview
This PR improves robustness around Cython-backed ranking operations by ensuring arrays passed into Cython memoryviews are writable and C-contiguous, and it adds additional logging/trace output for OSW (SQLite) export reads to aid debugging.
Changes:
- Added
_to_writable_c_array()helper and applied it to ranking paths that call Cython routines (find_top_ranked,rank). - Added a unit test validating
_to_writable_c_array()behavior for read-only arrays. - Refined OSW reader log messages and added extra trace/debug statements for row/column visibility; regenerated the Cython-generated
_optimized.c.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pyprophet/scoring/data_handling.py |
Adds _to_writable_c_array() and uses it to precondition arrays for Cython ranking calls. |
tests/test_data_handling.py |
Adds coverage for _to_writable_c_array() with read-only inputs. |
pyprophet/io/export/osw.py |
Updates OSW reader log wording and adds row/column trace logs to aid debugging. |
pyprophet/scoring/_optimized.c |
Large regenerated Cython output (header indicates Cython 3.2.4). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _to_writable_c_array(values, dtype): | ||
| """Return a writable, C-contiguous numpy array for Cython memoryviews.""" | ||
| arr = np.asarray(values, dtype=dtype) | ||
| return np.require(arr, dtype=dtype, requirements=["C", "W"]) |
There was a problem hiding this comment.
Consider adding a companion test for non-contiguous inputs (e.g., a sliced array with a stride or a Fortran-ordered array) to confirm _to_writable_c_array() reliably returns a C-contiguous writable array in addition to the read-only case. This function is now a key precondition for Cython memoryview calls, and non-contiguity was one of the reported failure modes.
| /* Generated by Cython 3.2.4 */ | ||
|
|
||
| /* BEGIN: Cython Metadata | ||
| { |
There was a problem hiding this comment.
_optimized.c appears to have been regenerated with a newer Cython (header now says 3.2.4). The repo’s pinned dependencies include cython==3.1.1 (requirements.txt), so regenerating with a different version can cause persistent diffs and confusion about the intended toolchain. Suggest either regenerating this file with the pinned Cython version or updating the pin/tooling so the generated C matches the expected build environment.
This pull request introduces improvements to logging and data handling in both the
pyprophet/io/export/osw.pyandpyprophet/scoring/data_handling.pymodules. The main changes include more descriptive log messages, enhanced debug output for data inspection, and a new utility function to ensure numpy arrays are writable and C-contiguous for compatibility with Cython routines. This utility function is now used in ranking operations to prevent issues with read-only or non-contiguous arrays, and it is covered by new tests.Logging and Debugging Improvements:
_read_sqliteand_read_standard_data. Additional trace logs were added after merging alignment features to help with debugging data processing steps. (pyprophet/io/export/osw.py) [1] [2] [3]Data Handling Enhancements:
_to_writable_c_array, a utility function to ensure numpy arrays passed to Cython routines are writable and C-contiguous, preventing potential issues with read-only or misaligned arrays. (pyprophet/scoring/data_handling.py)rank_by,add_peak_group_rank) to use_to_writable_c_arrayfor preparing input arrays, improving robustness and compatibility with Cython. (pyprophet/scoring/data_handling.py) [1] [2]Testing:
_to_writable_c_arraycorrectly handles read-only input arrays, ensuring the output is writable and C-contiguous. (tests/test_data_handling.py) [1] [2]