-
Notifications
You must be signed in to change notification settings - Fork 57
feat: add allow_resize for 1:N and N:1 generation patterns #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3c9fa49 to
8ba264c
Compare
bd474e3 to
0cedd56
Compare
Adds support for generators that produce a different number of records than the input (expansion or retraction). This addresses GitHub issue #265. Changes: - Add `allow_resize` parameter to `update_records()` in DatasetBatchManager - Add `allow_resize` field to CustomColumnConfig - Add validation requiring FULL_COLUMN strategy when allow_resize=True - Track and report actual_num_records in metadata (may differ from target) - Add logging when batch size changes - Add example_allow_resize.py demonstrating the feature - Add comprehensive tests
0cedd56 to
8c9a33e
Compare
- Fix example_allow_resize.py: remove non-existent CustomColumnContext, simplify to 1-arg signatures - Add allow_resize logging in CustomColumnGenerator.log_pre_generation
| default=None, | ||
| description="Optional typed configuration object passed as second argument to generator function", | ||
| ) | ||
| allow_resize: bool = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should elevate this to a property on the base column config (default is False), which you can override in custom columns and plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was the initial solution, then I ended up doing a mixin instead.
Thought it was a bit opaque for plugins specifically, that they developer to find out about a specific attribute/property 🤔 But it makes things simpler I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pattern we already use for custom emojis. Also the required_columns and side_effect_columns (these ones have to be set, though).
Summary
Adds
allow_resizeparameter toCustomColumnConfigenabling custom column generators to produce a different number of records than the input. This supports 1:N expansion (e.g., generating multiple variations per input) and N:1 retraction (e.g., filtering or aggregating records) patterns. Addresses #265.Changes
Added
allow_resizefield onCustomColumnConfigwith validation requiringfull_columnstrategyallow_resizeparameter toupdate_records()inDatasetBatchManageractual_num_recordstracking in dataset metadata (may differ fromtarget_num_recordswhen resizing)example_allow_resize.pydemonstrating expansion (1:N) and retraction (N:1) patternsChanged
column_wise_builder.py- logs resize operations, passesallow_resizeto batch managerCustomColumnGenerator.log_pre_generation()- logsallow_resizewhen enabledAttention Areas
dataset_batch_manager.py- Core change to buffer handling with newallow_resizeparametercolumn_configs.py- New field and validation logic onCustomColumnConfigcolumn_wise_builder.py- Engine-level handling for resize operationsDescription updated with AI