feat: add extract_reasoning_content option to LLM columns#285
Merged
eric-tramel merged 5 commits intomainfrom Feb 3, 2026
Merged
feat: add extract_reasoning_content option to LLM columns#285eric-tramel merged 5 commits intomainfrom
eric-tramel merged 5 commits intomainfrom
Conversation
Greptile OverviewGreptile SummaryThis PR adds an opt-in Key changes:
Issues found:
The implementation is well-designed, correctly handles edge cases (whitespace-only, missing reasoning, multi-assistant traces), and maintains independence between trace and reasoning content features. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant ConfigBuilder
participant LLMTextColumnConfig
participant Generator as LLMTextCellGenerator
participant Model as ModelFacade
participant Output as DataFrame
User->>ConfigBuilder: add_column(LLMTextColumnConfig(..., extract_reasoning_content=True))
ConfigBuilder->>LLMTextColumnConfig: Create config with extract_reasoning_content=True
LLMTextColumnConfig->>LLMTextColumnConfig: side_effect_columns property returns ["col__reasoning_content"]
Note over User,Output: Dataset Generation Phase
User->>Generator: generate(data)
Generator->>Model: generate(prompt, ...)
Model-->>Generator: (output, trace)
Note over Generator: trace = [<br/> ChatMessage(role="user", ...),<br/> ChatMessage(role="assistant", content="...", reasoning_content="..."),<br/>]
Generator->>Generator: _extract_reasoning_content(trace)
Generator->>Generator: Find last assistant message in reversed trace
Generator->>Generator: Extract reasoning_content field
Generator->>Generator: Strip whitespace, normalize empty to None
Generator->>Output: data[col] = output
Generator->>Output: data[col__reasoning_content] = extracted_reasoning
Output-->>User: DataFrame with main column and reasoning column
|
4d55df0 to
18700b7
Compare
Add an opt-in way for LLM generation columns to persist only the model's
reasoning_content (when the provider exposes it) into a dedicated
side-effect column.
When `extract_reasoning_content=True` is set on an LLM column config:
- A new `{name}__reasoning_content` column is created
- Only the reasoning_content from the final assistant message is stored
- Whitespace is trimmed, empty values become None
This is independent of the existing `with_trace` option which stores
the full conversation history.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
18700b7 to
04130e9
Compare
johnnygreco
reviewed
Feb 3, 2026
packages/data-designer-config/src/data_designer/config/column_configs.py
Show resolved
Hide resolved
johnnygreco
reviewed
Feb 3, 2026
packages/data-designer-config/src/data_designer/config/column_configs.py
Outdated
Show resolved
Hide resolved
johnnygreco
approved these changes
Feb 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
extract_reasoning_content: bool = Falsefield toLLMTextColumnConfig(and all derived LLM configs){name}__reasoning_contentside-effect column containing only the reasoning content from the final assistant responsereasoning_contentfield from the last assistant message in the trace, normalizing whitespace-only values toNoneUsage
This is useful for models that expose chain-of-thought reasoning separately from the main response (e.g., models with extended thinking capabilities).
Test plan
side_effect_columnsbehavior with/withoutextract_reasoning_content🤖 Generated with Claude Code