feat(bigframes): Add substrait-datafusion backend by TrevorBergeron · Pull Request #17215 · googleapis/google-cloud-python

TrevorBergeron · 2026-05-20T20:48:01Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

gemini-code-assist

Code Review

This pull request introduces a Substrait compilation and execution framework for BigFrames, featuring a compiler that translates logical plans into Substrait format and an executor that utilizes Apache DataFusion for local execution. The implementation handles various relational operations such as joins, aggregations, and window functions, and includes a testing suite with a dedicated test session. Review feedback focuses on enhancing code quality by addressing a potential NameError, removing debug print and stderr statements, replacing hardcoded magic numbers with existing mappings, and optimizing import placement for better performance.

gemini-code-assist · 2026-05-20T20:59:08Z

+        project_rel2.common.emit.output_mapping.extend(output_mapping)
+        return rel2
+
+    def _compile_bound(self, val: typing.Optional[int], bound_msg: algebra_pb2.Expression.WindowFunction.Bound):


The typing module is not imported, which will cause a NameError when accessing typing.Optional. Since Optional is already imported from typing at the top of the file, use it directly.

def _compile_bound(self, val: Optional[int], bound_msg: algebra_pb2.Expression.WindowFunction.Bound):

gemini-code-assist · 2026-05-20T20:59:08Z

+        pb_rel = self._compile_node(plan)
+
+        pb_plan = plan_pb2.Plan()
+        pb_plan.version.minor_number = 42


The Substrait version minor number is hardcoded to 42. This appears to be a placeholder and should be replaced with a valid version number or a configurable parameter.

gemini-code-assist · 2026-05-20T20:59:08Z

+        import sys
+        for n in plan.unique_nodes():
+             if not isinstance(n, supported_nodes):
+                  sys.stderr.write(f"UNSUPPORTED NODE TYPE: {type(n).__name__} -> {n}\n")
+                  sys.stderr.flush()


This block uses sys.stderr for debug logging. Replace this with proper logging to aid in debugging. Additionally, the loop is redundant as the same check is performed by the all() call on line 88; remove it to keep the codebase clean and avoid redundancy.

References

Remove duplicate lines of code to keep the codebase clean and avoid redundancy.

Avoid using direct stderr writes for logging; use a logger to aid in debugging and prevent masking issues.

gemini-code-assist · 2026-05-20T20:59:08Z

+            "names": fields,
+            "struct": {"types": types}
+        }
+        print("SCHEMA_DICT:", schema_dict)


Remove this debug print statement.

gemini-code-assist · 2026-05-20T20:59:08Z

+        sys.stderr.write(f"JOIN CONDITIONS: {node.conditions}\n")
+        sys.stderr.flush()


Remove these debug stderr writes.

gemini-code-assist · 2026-05-20T20:59:08Z

+             expr = eq_expressions[0]
+             for e in eq_expressions[1:]:
+                  and_expr = algebra_pb2.Expression()
+                  and_expr.scalar_function.function_reference = 13 # and


Avoid using hardcoded magic numbers for function references. Use the _EXTENSIONS mapping instead for consistency and maintainability.

and_expr.scalar_function.function_reference = self._EXTENSIONS["and"]

gemini-code-assist · 2026-05-20T20:59:09Z

+                  table = table.select([item.source_id for item in node.scan_list.items])
+                  table = table.rename_columns([item.id.sql for item in node.scan_list.items])
+                  if node.offsets_col is not None:
+                       from bigframes.core import pyarrow_utils


This import is inside a loop. For better performance and to follow best practices, move it to the top of the method or the file.

TrevorBergeron added 5 commits May 19, 2026 00:15

feat(bigframes): Add substrait-datafusion engine

c7d79bc

add more ops to substrait compiler

1ebcfcf

more ops, types

daed87e

support more ops to substrait

a26aeba

more work

e34b4ab

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bigframes): Add substrait-datafusion backend#17215

feat(bigframes): Add substrait-datafusion backend#17215
TrevorBergeron wants to merge 5 commits into
mainfrom
tbergeron_substrait

TrevorBergeron commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		sys.stderr.write(f"JOIN CONDITIONS: {node.conditions}\n")
		sys.stderr.flush()

Conversation

TrevorBergeron commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant