Skip to content

fix: cast JSON and nested struct columns to string for anywidget rendering#17189

Merged
shuoweil merged 12 commits into
mainfrom
shuowei-fix-json-struct-anywidget-display
May 20, 2026
Merged

fix: cast JSON and nested struct columns to string for anywidget rendering#17189
shuoweil merged 12 commits into
mainfrom
shuowei-fix-json-struct-anywidget-display

Conversation

@shuoweil
Copy link
Copy Markdown
Contributor

@shuoweil shuoweil commented May 19, 2026

This Pull Request resolves visualization crashes when rendering DataFrames or Series containing raw JSON and nested JSON struct structures. It ensures that these columns are safely pre-serialized into clean, flat JSON string format on the database level prior to visual layout rendering.

Verified at: screen/424ojbuqyBPinTb

Fixes #<514763826> 🦕

@shuoweil shuoweil self-assigned this May 19, 2026
@shuoweil shuoweil requested review from a team as code owners May 19, 2026 21:02
@shuoweil shuoweil requested review from tswast and removed request for a team May 19, 2026 21:02
@shuoweil shuoweil marked this pull request as draft May 19, 2026 21:02
@shuoweil shuoweil requested review from GarrettWu and sycai and removed request for tswast May 19, 2026 21:03
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for displaying JSON and nested JSON columns in interactive widgets by converting them to strings using BigQuery's TO_JSON_STRING function, addressing a limitation in Arrow/Pandas batch processing. The changes include updates to the DataFrame display logic, HTML representation, and unit tests. A review comment suggests optimizing the transformation loop in dataframe.py by consolidating multiple assign calls into a single operation to improve performance and reduce object creation overhead.

Comment thread packages/bigframes/bigframes/dataframe.py Outdated
@shuoweil shuoweil marked this pull request as ready for review May 19, 2026 22:24
@shuoweil shuoweil added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:run Add this label to force Kokoro to re-run the tests. labels May 20, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 20, 2026


def test_json_column_converted_to_string_for_display():
from bigframes.core.blocks import Block
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to import everything at the top of the file, so that we don't have to repeat these lines in each test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all duplicate local imports (Block, DataFrame, JSON_DTYPE, STRING_DTYPE, struct_type, SqlScalarOp, and TableWidget) to the top of test_anywidget.py. Placed them immediately following the pytest.importorskip guards to ensure module import safety.



def test_repr_mimebundle_head():
from unittest.mock import Mock, patch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, import at the top of the file?

@@ -820,8 +820,24 @@ def __repr__(self) -> str:
)

def _get_display_df_and_blob_cols(self) -> tuple[DataFrame, list[str]]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can ditch the second return value and just return the display df? This this a private function so it should be safe to change its return type.

The only concern I have is that, if this function is called in many places, updating the callers might be a chore. However, if it only has around 5 callers or so, we can make the change in this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified _get_display_df_and_blob_cols to _get_display_df which returns only the processed DataFrame. Updated all matching test mocks and calls in both test_anywidget.py and test_html.py to use the new single-return signature.

def _to_display_df(
obj: Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
) -> bigframes.dataframe.DataFrame:
from bigframes.series import Series
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to put this import to the top of the file?

) -> bigframes.dataframe.DataFrame:
from bigframes.series import Series

df = obj.to_frame() if isinstance(obj, Series) else obj
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is another design suggestion for you: instead of performing the type check in the _to_display_df(), define a "_get_display_df_and_blob_cols" method directly on the series.Series class, which basically looks like this:

def _get_display_df_and_blob_cols(self):
  return self.to_frame()._get_display_df_and_blob_cols()

There are two benefits:

  • You don't need to perform type checks here any more
  • You provide better functional parity between the Dataframe class and the Series class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a polymorphic _get_display_df(self) method directly to the Series class, implemented as return self.to_frame()._get_display_df(). Leveraged this signature in html.py to simplify _to_display_df(obj) to a clean, type-check-free one-liner: return obj._get_display_df().

@shuoweil shuoweil requested a review from sycai May 20, 2026 20:52
Comment on lines +227 to +232
def _to_display_df(
obj: Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
) -> bigframes.dataframe.DataFrame:
return obj._get_display_df()


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function now becomes so simple that we can simply remove it. Just use obj._get_display_df() at its call sites.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it is removed.

@shuoweil shuoweil requested a review from sycai May 20, 2026 21:46
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 20, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 20, 2026
@shuoweil shuoweil merged commit 994a22d into main May 20, 2026
35 checks passed
@shuoweil shuoweil deleted the shuowei-fix-json-struct-anywidget-display branch May 20, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants