feat: add extract_reasoning_content option to LLM columns by eric-tramel · Pull Request #285 · NVIDIA-NeMo/DataDesigner

eric-tramel · 2026-02-02T23:51:45Z

Summary

Adds opt-in extract_reasoning_content: bool = False field to LLMTextColumnConfig (and all derived LLM configs)
When enabled, creates a {name}__reasoning_content side-effect column containing only the reasoning content from the final assistant response
Extracts and strips the reasoning_content field from the last assistant message in the trace, normalizing whitespace-only values to None

Usage

import data_designer.config as dd

config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="response",
        prompt="Solve this problem: {{ problem }}",
        model_alias="reasoning-model",
        extract_reasoning_content=True,  # Creates response__reasoning_content column
    )
)

This is useful for models that expose chain-of-thought reasoning separately from the main response (e.g., models with extended thinking capabilities).

Test plan

Config tests verify side_effect_columns behavior with/without extract_reasoning_content
Generator tests verify reasoning content extraction from various trace types
All existing tests continue to pass

🤖 Generated with Claude Code

greptile-apps · 2026-02-02T23:55:17Z

Greptile Overview

Greptile Summary

This PR adds an opt-in extract_reasoning_content boolean field to all LLM column configurations, enabling separate capture of chain-of-thought reasoning from models that expose it via the reasoning_content field. When enabled, a {column_name}__reasoning_content side-effect column is created containing the stripped reasoning content from the final assistant message in the trace.

Key changes:

Added extract_reasoning_content: bool = False field to LLMTextColumnConfig (inherited by LLMCodeColumnConfig, LLMStructuredColumnConfig, and LLMJudgeColumnConfig)
Updated side_effect_columns property to conditionally include the reasoning content column based on feature flags (with_trace and extract_reasoning_content)
Implemented _extract_reasoning_content() method in ColumnGeneratorWithModelChatCompletion that extracts reasoning from the last assistant message, strips whitespace, and normalizes empty values to None
Added comprehensive tests covering various scenarios: extraction with content, missing content, tool-use traces with multiple assistant messages, disabled feature, and whitespace-only content
Updated documentation in columns.md and traces.md with usage examples and a comparison table
Demonstrated the feature in the pdf_qa.py recipe

Issues found:

Minor: Test at line 143 in test_columns.py is missing an assertion to verify side_effect_columns includes the trace column when with_trace=TraceType.LAST_MESSAGE

The implementation is well-designed, correctly handles edge cases (whitespace-only, missing reasoning, multi-assistant traces), and maintains independence between trace and reasoning content features.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation is clean, well-tested with comprehensive edge case coverage, properly documented, and follows the project's architecture patterns. The only issue is a minor missing test assertion that doesn't affect functionality.
No files require special attention

Important Files Changed

Filename	Overview
packages/data-designer-config/src/data_designer/config/column_configs.py	Added `extract_reasoning_content` boolean field to `LLMTextColumnConfig` and updated `side_effect_columns` property to conditionally include reasoning content column based on this field
packages/data-designer-config/tests/config/test_columns.py	Updated test expectations to correctly handle side effect columns based on feature flags; added comprehensive test for `extract_reasoning_content` feature, but missing assertion for `config_last.side_effect_columns`
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/llm_completion.py	Added `_extract_reasoning_content` method that extracts and strips reasoning content from final assistant message; integrated into `generate` method to populate reasoning column when enabled

Sequence Diagram

sequenceDiagram
    participant User
    participant ConfigBuilder
    participant LLMTextColumnConfig
    participant Generator as LLMTextCellGenerator
    participant Model as ModelFacade
    participant Output as DataFrame

    User->>ConfigBuilder: add_column(LLMTextColumnConfig(..., extract_reasoning_content=True))
    ConfigBuilder->>LLMTextColumnConfig: Create config with extract_reasoning_content=True
    LLMTextColumnConfig->>LLMTextColumnConfig: side_effect_columns property returns ["col__reasoning_content"]
    
    Note over User,Output: Dataset Generation Phase
    
    User->>Generator: generate(data)
    Generator->>Model: generate(prompt, ...)
    Model-->>Generator: (output, trace)
    
    Note over Generator: trace = [<br/>  ChatMessage(role="user", ...),<br/>  ChatMessage(role="assistant", content="...", reasoning_content="..."),<br/>]
    
    Generator->>Generator: _extract_reasoning_content(trace)
    Generator->>Generator: Find last assistant message in reversed trace
    Generator->>Generator: Extract reasoning_content field
    Generator->>Generator: Strip whitespace, normalize empty to None
    
    Generator->>Output: data[col] = output
    Generator->>Output: data[col__reasoning_content] = extracted_reasoning
    Output-->>User: DataFrame with main column and reasoning column

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Add an opt-in way for LLM generation columns to persist only the model's reasoning_content (when the provider exposes it) into a dedicated side-effect column. When `extract_reasoning_content=True` is set on an LLM column config: - A new `{name}__reasoning_content` column is created - Only the reasoning_content from the final assistant message is stored - Whitespace is trimmed, empty values become None This is independent of the existing `with_trace` option which stores the full conversation history. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

packages/data-designer-config/src/data_designer/config/column_configs.py

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

packages/data-designer-config/tests/config/test_columns.py

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

packages/data-designer-config/tests/config/test_columns.py

eric-tramel requested a review from a team as a code owner February 2, 2026 23:51

eric-tramel marked this pull request as draft February 2, 2026 23:52

eric-tramel self-assigned this Feb 2, 2026

greptile-apps bot reviewed Feb 2, 2026

View reviewed changes

eric-tramel force-pushed the ewt/extract-reasoning-content branch from 4d55df0 to 18700b7 Compare February 3, 2026 01:23

eric-tramel force-pushed the ewt/extract-reasoning-content branch from 18700b7 to 04130e9 Compare February 3, 2026 01:42

eric-tramel marked this pull request as ready for review February 3, 2026 01:42

eric-tramel added the enhancement New feature or request label Feb 3, 2026

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Put reasoning into the answer for pdf_qa

156ecdb

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

eric-tramel requested review from johnnygreco and nabinchha February 3, 2026 01:58

johnnygreco reviewed Feb 3, 2026

View reviewed changes

packages/data-designer-config/src/data_designer/config/column_configs.py Show resolved Hide resolved

johnnygreco reviewed Feb 3, 2026

View reviewed changes

packages/data-designer-config/src/data_designer/config/column_configs.py Outdated Show resolved Hide resolved

eric-tramel added 3 commits February 3, 2026 10:10

Make trace column side-effect optional

90f99ab

Merge branch 'main' into ewt/extract-reasoning-content

387fff1

Fix tests to match correct logic

2f0f38e

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

packages/data-designer-config/tests/config/test_columns.py Outdated Show resolved Hide resolved

packages/data-designer-config/tests/config/test_columns.py Outdated Show resolved Hide resolved

packages/data-designer-config/tests/config/test_columns.py Outdated Show resolved Hide resolved

johnnygreco approved these changes Feb 3, 2026

View reviewed changes

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

packages/data-designer-config/tests/config/test_columns.py Show resolved Hide resolved

eric-tramel merged commit 532d21a into main Feb 3, 2026
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add extract_reasoning_content option to LLM columns#285

feat: add extract_reasoning_content option to LLM columns#285
eric-tramel merged 5 commits intomainfrom
ewt/extract-reasoning-content

eric-tramel commented Feb 2, 2026

Uh oh!

greptile-apps bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eric-tramel commented Feb 2, 2026

Summary

Usage

Test plan

Uh oh!

greptile-apps bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Feb 2, 2026 •

edited

Loading