feat(bigframes): Defer unnamed @udf deployment until needed#17217
feat(bigframes): Defer unnamed @udf deployment until needed#17217TrevorBergeron wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements deferred deployment for unnamed User Defined Functions (UDFs) in BigFrames. Instead of provisioning UDFs immediately during registration, they are now represented as PythonUdf definitions and deployed only when the execution plan is prepared for BigQuery execution. The changes include adding tracking for deployed routines in the function session, a new plan-rewriting step in the caching executor to handle on-demand deployment, and updated data structures for UDF requirements. Feedback suggests parallelizing these deployments using asyncio.gather to improve performance when multiple UDFs are present in a single plan.
| for udf in unique_undeployed_udfs: | ||
| deployed_udf = await asyncio.to_thread( | ||
| session._function_session.deploy_undeployed_udf, | ||
| session, | ||
| udf, | ||
| ) | ||
| deployed_mapping[udf] = deployed_udf |
There was a problem hiding this comment.
UDFs are currently deployed sequentially. Since each deployment involves network calls to BigQuery and resource provisioning, this can significantly delay query execution when multiple UDFs are used in a single plan. Parallelizing these deployments using asyncio.gather would improve performance.
# Deploy UDFs in parallel to improve performance
tasks = [
asyncio.to_thread(
session._function_session.deploy_undeployed_udf,
session,
udf,
)
for udf in unique_undeployed_udfs
]
results = await asyncio.gather(*tasks)
deployed_mapping = dict(zip(unique_undeployed_udfs, results))
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕