diff --git a/packages/bigframes/notebooks/dataframes/magics_with_local_data.ipynb b/packages/bigframes/notebooks/dataframes/magics_with_local_data.ipynb new file mode 100644 index 000000000000..a008b011f1dc --- /dev/null +++ b/packages/bigframes/notebooks/dataframes/magics_with_local_data.ipynb @@ -0,0 +1,1777 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "c5f9e86e", + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright 2026 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "id": "71383fa0", + "metadata": {}, + "source": [ + "# Query `*.xlsx` files with SQL for free\\* using Pandas and BigQuery DataFrames\n", + "\n", + "In this tutorial, you'll use SQL query the [USDA wheat data](https://www.ers.usda.gov/data-products/wheat-data), which is distributed as files in the [Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML file format](https://learn.microsoft.com/en-us/openspecs/office_standards/ms-xlsx/2c5dee00-eff2-4b22-92b6-0738acd4475e). Because of open source packages like Jupyter, Pandas, and BigQuery DataFrames (aka BigFrames) and the [BigQuery sandbox](https://docs.cloud.google.com/bigquery/docs/sandbox), you should be able to follow all of the steps in this guide for free\\* and without a credit card. \n", + "\n", + "_\\*See the [BigQuery sandbox](https://docs.cloud.google.com/bigquery/docs/sandbox) documentation for limitations._\n", + "\n", + "BigQuery DataFrames aka BigFrames is an open source Python library offered by Google. BigFrames scales Python data processing by transpiling common Python data science APIs to BigQuery SQL. You can read more about BigFrames in the [official introduction to BigFrames](https://dataframes.bigquery.dev/user_guide/index.html) and can refer to the [public git repository for BigFrames](https://github.com/googleapis/google-cloud-python/tree/main/packages/bigframes).\n", + "\n", + "Last year, Google introduced [SQL cells in Colab Enterprise notebooks](https://docs.cloud.google.com/colab/docs/sql-cells), which was a collaboration across several teams, including the BigQuery DataFrames team. Now, with the [%%bqsql cell magics](https://dataframes.bigquery.dev/notebooks/getting_started/magics.html) available in BigQuery DataFrames (aka BigFrames), this same functionality is available to all Jupyter notebook users, whether you're in Colab, Jupyter Lab, or a notebook in VS Code. These magics use BigQuery to query a table or even a local pandas DataFrame.\n", + "\n", + "\n", + "# Getting Started\n", + "\n", + "To get started,\n", + "\n", + "1. Enable the [BigQuery sandbox](https://docs.cloud.google.com/bigquery/docs/sandbox). Make note of your Google Cloud project ID.\n", + "\n", + "2. Set up a local Python development environment (see: [Setting up a Python development environment](https://docs.cloud.google.com/python/docs/setup)) for Google Cloud.\n", + "\n", + "3. Create and activate a venv to isolate Python dependencies. \\\n", + " \\\n", + "On Linux or macOS, use these commands (update to your preferred Python version): \\\n", + "\n", + "\n", + "\n", + "```\n", + "python3.12 -m venv ~/venv\n", + ". ~/venv/bin/activate\n", + "```\n", + "\n", + "\n", + "4. Install the Jupyter, bigframes, and python-calamine packages \\\n", + "\n", + "\n", + "\n", + "```\n", + "pip install --upgrade jupyterlab bigframes python-calamine\n", + "```\n", + "\n", + "\n", + "5. Start Jupyter Lab.\n", + "\n", + "\n", + "```\n", + "jupyter lab\n", + "```\n", + "\n", + "\n", + "6. Open a web browser to the URL listed in the output. It will be something like http://localhost:8888/lab?token=somesupersecretvaluehere .\n", + "\n", + "7. Create a new notebook using the Jupyter Lab UI.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d00aeb28", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: python-calamine in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (0.6.2)\n", + "Requirement already satisfied: pandas in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (3.0.2)\n", + "Requirement already satisfied: bigframes in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (2.39.0)\n", + "Requirement already satisfied: numpy>=2.3.3 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pandas) (2.4.4)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pandas) (2.9.0.post0)\n", + "Requirement already satisfied: cloudpickle>=2.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (3.1.2)\n", + "Requirement already satisfied: fsspec>=2023.3.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (2026.1.0)\n", + "Requirement already satisfied: gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (2026.1.0)\n", + "Requirement already satisfied: geopandas>=0.12.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.1.3)\n", + "Requirement already satisfied: google-auth<3.0,>=2.15.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (2.49.1)\n", + "Requirement already satisfied: google-cloud-bigquery>=3.36.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-cloud-bigquery[bqstorage,pandas]>=3.36.0->bigframes) (3.41.0)\n", + "Requirement already satisfied: google-cloud-bigquery-storage<3.0.0,>=2.30.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (2.37.0)\n", + "Requirement already satisfied: google-cloud-functions>=1.12.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.23.0)\n", + "Requirement already satisfied: google-cloud-bigquery-connection>=1.12.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.21.0)\n", + "Requirement already satisfied: google-cloud-resource-manager>=1.10.3 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.17.0)\n", + "Requirement already satisfied: google-cloud-storage>=2.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (3.10.1)\n", + "Requirement already satisfied: google-crc32c<2.0.0,>=1.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.8.0)\n", + "Requirement already satisfied: grpc-google-iam-v1>=0.14.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (0.14.4)\n", + "Requirement already satisfied: pandas-gbq>=0.26.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (0.34.1)\n", + "Requirement already satisfied: pyarrow>=15.0.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (23.0.1)\n", + "Requirement already satisfied: pydata-google-auth>=1.8.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.9.1)\n", + "Requirement already satisfied: requests>=2.27.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (2.33.1)\n", + "Requirement already satisfied: shapely>=1.8.5 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (2.1.2)\n", + "Requirement already satisfied: tabulate>=0.9 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (0.10.0)\n", + "Requirement already satisfied: humanize>=4.6.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (4.15.0)\n", + "Requirement already satisfied: matplotlib>=3.7.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (3.10.8)\n", + "Requirement already satisfied: db-dtypes>=1.4.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.5.1)\n", + "Requirement already satisfied: pyiceberg>=0.7.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (0.11.1)\n", + "Requirement already satisfied: atpublic<6,>=2.3 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (5.1)\n", + "Requirement already satisfied: pytz>=2022.7 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (2026.1.post1)\n", + "Requirement already satisfied: toolz<2,>=0.11 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (1.1.0)\n", + "Requirement already satisfied: typing-extensions<5,>=4.5.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (4.15.0)\n", + "Requirement already satisfied: rich<14,>=12.4.4 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from bigframes) (13.9.4)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-auth<3.0,>=2.15.0->bigframes) (0.4.2)\n", + "Requirement already satisfied: cryptography>=38.0.3 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-auth<3.0,>=2.15.0->bigframes) (46.0.6)\n", + "Requirement already satisfied: google-api-core<3.0.0,>=2.11.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-api-core[grpc]<3.0.0,>=2.11.0->google-cloud-bigquery-storage<3.0.0,>=2.30.0->bigframes) (2.30.2)\n", + "Requirement already satisfied: grpcio<2.0.0,>=1.33.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-cloud-bigquery-storage<3.0.0,>=2.30.0->bigframes) (1.80.0)\n", + "Requirement already satisfied: proto-plus<2.0.0,>=1.22.3 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-cloud-bigquery-storage<3.0.0,>=2.30.0->bigframes) (1.27.2)\n", + "Requirement already satisfied: protobuf<8.0.0,>=4.25.8 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-cloud-bigquery-storage<3.0.0,>=2.30.0->bigframes) (6.33.6)\n", + "Requirement already satisfied: googleapis-common-protos<2.0.0,>=1.63.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-api-core<3.0.0,>=2.11.0->google-api-core[grpc]<3.0.0,>=2.11.0->google-cloud-bigquery-storage<3.0.0,>=2.30.0->bigframes) (1.74.0)\n", + "Requirement already satisfied: grpcio-status<2.0.0,>=1.33.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-api-core[grpc]<3.0.0,>=2.11.0->google-cloud-bigquery-storage<3.0.0,>=2.30.0->bigframes) (1.80.0)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from requests>=2.27.1->bigframes) (3.4.7)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from requests>=2.27.1->bigframes) (3.11)\n", + "Requirement already satisfied: urllib3<3,>=1.26 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from requests>=2.27.1->bigframes) (2.6.3)\n", + "Requirement already satisfied: certifi>=2023.5.7 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from requests>=2.27.1->bigframes) (2026.2.25)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from rich<14,>=12.4.4->bigframes) (4.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from rich<14,>=12.4.4->bigframes) (2.20.0)\n", + "Requirement already satisfied: cffi>=2.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from cryptography>=38.0.3->google-auth<3.0,>=2.15.0->bigframes) (2.0.0)\n", + "Requirement already satisfied: pycparser in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from cffi>=2.0.0->cryptography>=38.0.3->google-auth<3.0,>=2.15.0->bigframes) (3.0)\n", + "Requirement already satisfied: packaging>=24.2.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from db-dtypes>=1.4.2->bigframes) (26.0)\n", + "\u001b[33mWARNING: Cache entry deserialization failed, entry ignored\u001b[0m\u001b[33m\n", + "\u001b[0mCollecting pandas\n", + " Using cached pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (91 kB)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pandas) (2026.1)\n", + "Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (3.13.5)\n", + "Requirement already satisfied: decorator>4.1.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (5.2.1)\n", + "Requirement already satisfied: google-auth-oauthlib in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (1.3.1)\n", + "Requirement already satisfied: google-cloud-storage-control in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (1.11.0)\n", + "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (2.6.1)\n", + "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (1.4.0)\n", + "Requirement already satisfied: attrs>=17.3.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (26.1.0)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (1.8.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (6.7.1)\n", + "Requirement already satisfied: propcache>=0.2.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (0.4.1)\n", + "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (1.23.0)\n", + "Requirement already satisfied: pyogrio>=0.7.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from geopandas>=0.12.2->bigframes) (0.12.1)\n", + "Requirement already satisfied: pyproj>=3.5.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from geopandas>=0.12.2->bigframes) (3.7.2)\n", + "Requirement already satisfied: google-cloud-core<3.0.0,>=2.4.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-cloud-bigquery>=3.36.0->google-cloud-bigquery[bqstorage,pandas]>=3.36.0->bigframes) (2.5.1)\n", + "Requirement already satisfied: google-resumable-media<3.0.0,>=2.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-cloud-bigquery>=3.36.0->google-cloud-bigquery[bqstorage,pandas]>=3.36.0->bigframes) (2.8.2)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from markdown-it-py>=2.2.0->rich<14,>=12.4.4->bigframes) (0.1.2)\n", + "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from matplotlib>=3.7.1->bigframes) (1.3.3)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from matplotlib>=3.7.1->bigframes) (0.12.1)\n", + "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from matplotlib>=3.7.1->bigframes) (4.62.1)\n", + "Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from matplotlib>=3.7.1->bigframes) (1.5.0)\n", + "Requirement already satisfied: pillow>=8 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from matplotlib>=3.7.1->bigframes) (12.2.0)\n", + "Requirement already satisfied: pyparsing>=3 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from matplotlib>=3.7.1->bigframes) (3.3.2)\n", + "Requirement already satisfied: setuptools in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pandas-gbq>=0.26.1->bigframes) (82.0.1)\n", + "Requirement already satisfied: psutil>=5.9.8 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pandas-gbq>=0.26.1->bigframes) (7.2.2)\n", + "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from google-auth-oauthlib->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (2.0.0)\n", + "Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0,>=2.15.0->bigframes) (0.6.3)\n", + "Requirement already satisfied: mmh3<6.0.0,>=4.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (5.2.1)\n", + "Requirement already satisfied: click<9.0.0,>=7.1.1 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (8.3.2)\n", + "Requirement already satisfied: strictyaml<2.0.0,>=1.7.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (1.7.3)\n", + "Requirement already satisfied: pydantic!=2.12.0,!=2.12.1,!=2.4.0,!=2.4.1,<3.0,>=2.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (2.12.5)\n", + "Requirement already satisfied: tenacity<10.0.0,>=8.2.3 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (9.1.4)\n", + "Requirement already satisfied: pyroaring<2.0.0,>=1.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (1.0.4)\n", + "Requirement already satisfied: cachetools<7.0,>=5.5 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (6.2.6)\n", + "Requirement already satisfied: zstandard<1.0.0,>=0.13.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pyiceberg>=0.7.1->bigframes) (0.25.0)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pydantic!=2.12.0,!=2.12.1,!=2.4.0,!=2.4.1,<3.0,>=2.0->pyiceberg>=0.7.1->bigframes) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.41.5 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pydantic!=2.12.0,!=2.12.1,!=2.4.0,!=2.4.1,<3.0,>=2.0->pyiceberg>=0.7.1->bigframes) (2.41.5)\n", + "Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from pydantic!=2.12.0,!=2.12.1,!=2.4.0,!=2.4.1,<3.0,>=2.0->pyiceberg>=0.7.1->bigframes) (0.4.2)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/google/home/swast/src/github.com/googleapis/google-cloud-python/packages/bigframes/venv/lib/python3.14/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib->gcsfs!=2025.5.0,!=2026.2.0,!=2026.3.0,>=2023.3.0->bigframes) (3.3.1)\n", + "Using cached pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.3 MB)\n", + "Installing collected packages: pandas\n", + " Attempting uninstall: pandas\n", + " Found existing installation: pandas 3.0.2\n", + " Uninstalling pandas-3.0.2:\n", + " Successfully uninstalled pandas-3.0.2\n", + "Successfully installed pandas-2.3.3\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install python-calamine pandas bigframes" + ] + }, + { + "cell_type": "markdown", + "id": "5ba39d0d", + "metadata": {}, + "source": [ + "## Accessing the data\n", + "\n", + "In this tutorial, you'll analyze the [USDA wheat data](https://www.ers.usda.gov/data-products/wheat-data). Use the requests package to download the data to a temporary file." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "fb1dfdc2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "import tempfile\n", + "\n", + "import requests\n", + "\n", + "url = \"https://www.ers.usda.gov/media/5706/wheat-data-all-years.xlsx?v=52690\"\n", + "\n", + "tmp = tempfile.NamedTemporaryFile(delete=True)\n", + "\n", + "with requests.get(url, stream=True) as r:\n", + " r.raise_for_status()\n", + " for chunk in r.iter_content(chunk_size=8192):\n", + " tmp.write(chunk)\n", + "\n", + "tmp.flush()\n", + "tmp.seek(0)" + ] + }, + { + "cell_type": "markdown", + "id": "50f896bb", + "metadata": {}, + "source": [ + "\n", + "When working with SQL, use the pyarrow dtype_backend for more consistent handling of NULL values. The Table05 sheet provides annual data:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "8a8a137b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Marketing year 1/Time periodBeginning stocksProductionImports 2/Total supply 3/Food useSeed useFeed and residual useTotal domestic use 3/Exports 2/Total disappearance 3/Ending stocks
01950/51MY Jun-May496.01019.011.01526.0580.0--109.0689.0345.01034.0492.0
11951/52MY Jun-May492.0988.030.01510.0585.0--110.0695.0485.01180.0330.0
21952/53MY Jun-May330.01306.024.01660.0578.0--78.0656.0332.0988.0672.0
31953/54MY Jun-May672.01173.06.01851.0556.0--87.0643.0214.0857.0994.0
41954/55MY Jun-May994.0984.03.01981.0552.0--53.0605.0267.0872.01109.0
..........................................
2811/ June–May. Latest data may be preliminary or...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2822/ Includes flour and selected other products ...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2833/ Totals may not add due to rounding.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
284Source: USDA, Economic Research Service, based...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
285Updated: May 12, 2026<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
\n", + "

286 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " Marketing year 1/ Time period \\\n", + "0 1950/51 MY Jun-May \n", + "1 1951/52 MY Jun-May \n", + "2 1952/53 MY Jun-May \n", + "3 1953/54 MY Jun-May \n", + "4 1954/55 MY Jun-May \n", + ".. ... ... \n", + "281 1/ June–May. Latest data may be preliminary or... \n", + "282 2/ Includes flour and selected other products ... \n", + "283 3/ Totals may not add due to rounding. \n", + "284 Source: USDA, Economic Research Service, based... \n", + "285 Updated: May 12, 2026 \n", + "\n", + " Beginning stocks Production Imports 2/ Total supply 3/ Food use \\\n", + "0 496.0 1019.0 11.0 1526.0 580.0 \n", + "1 492.0 988.0 30.0 1510.0 585.0 \n", + "2 330.0 1306.0 24.0 1660.0 578.0 \n", + "3 672.0 1173.0 6.0 1851.0 556.0 \n", + "4 994.0 984.0 3.0 1981.0 552.0 \n", + ".. ... ... ... ... ... \n", + "281 \n", + "282 \n", + "283 \n", + "284 \n", + "285 \n", + "\n", + " Seed use Feed and residual use Total domestic use 3/ Exports 2/ \\\n", + "0 -- 109.0 689.0 345.0 \n", + "1 -- 110.0 695.0 485.0 \n", + "2 -- 78.0 656.0 332.0 \n", + "3 -- 87.0 643.0 214.0 \n", + "4 -- 53.0 605.0 267.0 \n", + ".. ... ... ... ... \n", + "281 \n", + "282 \n", + "283 \n", + "284 \n", + "285 \n", + "\n", + " Total disappearance 3/ Ending stocks \n", + "0 1034.0 492.0 \n", + "1 1180.0 330.0 \n", + "2 988.0 672.0 \n", + "3 857.0 994.0 \n", + "4 872.0 1109.0 \n", + ".. ... ... \n", + "281 \n", + "282 \n", + "283 \n", + "284 \n", + "285 \n", + "\n", + "[286 rows x 13 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "import pandas as pd\n", + "\n", + "df = pd.read_excel(\n", + " tmp,\n", + " sheet_name=\"Table05\",\n", + " dtype_backend=\"pyarrow\",\n", + " engine=\"calamine\",\n", + " header=1, # Skip the first row.\n", + ")\n", + "tmp.close()\n", + "df" + ] + }, + { + "cell_type": "markdown", + "id": "1a7ec573", + "metadata": {}, + "source": [ + "Rename the columns to be more SQL-friendly. BigQuery supports [flexible column names](https://docs.cloud.google.com/bigquery/docs/schemas#flexible-column-names), which allows most unicode characters, but some special characters such as \"\\\" and \"/\" aren't allowed." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d5674020", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Marketing year 1Time periodBeginning stocksProductionImports 2Total supply 3Food useSeed useFeed and residual useTotal domestic use 3Exports 2Total disappearance 3Ending stocks
01950/51MY Jun-May496.01019.011.01526.0580.0--109.0689.0345.01034.0492.0
11951/52MY Jun-May492.0988.030.01510.0585.0--110.0695.0485.01180.0330.0
21952/53MY Jun-May330.01306.024.01660.0578.0--78.0656.0332.0988.0672.0
31953/54MY Jun-May672.01173.06.01851.0556.0--87.0643.0214.0857.0994.0
41954/55MY Jun-May994.0984.03.01981.0552.0--53.0605.0267.0872.01109.0
..........................................
2811/ June–May. Latest data may be preliminary or...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2822/ Includes flour and selected other products ...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2833/ Totals may not add due to rounding.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
284Source: USDA, Economic Research Service, based...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
285Updated: May 12, 2026<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
\n", + "

286 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " Marketing year 1 Time period \\\n", + "0 1950/51 MY Jun-May \n", + "1 1951/52 MY Jun-May \n", + "2 1952/53 MY Jun-May \n", + "3 1953/54 MY Jun-May \n", + "4 1954/55 MY Jun-May \n", + ".. ... ... \n", + "281 1/ June–May. Latest data may be preliminary or... \n", + "282 2/ Includes flour and selected other products ... \n", + "283 3/ Totals may not add due to rounding. \n", + "284 Source: USDA, Economic Research Service, based... \n", + "285 Updated: May 12, 2026 \n", + "\n", + " Beginning stocks Production Imports 2 Total supply 3 Food use \\\n", + "0 496.0 1019.0 11.0 1526.0 580.0 \n", + "1 492.0 988.0 30.0 1510.0 585.0 \n", + "2 330.0 1306.0 24.0 1660.0 578.0 \n", + "3 672.0 1173.0 6.0 1851.0 556.0 \n", + "4 994.0 984.0 3.0 1981.0 552.0 \n", + ".. ... ... ... ... ... \n", + "281 \n", + "282 \n", + "283 \n", + "284 \n", + "285 \n", + "\n", + " Seed use Feed and residual use Total domestic use 3 Exports 2 \\\n", + "0 -- 109.0 689.0 345.0 \n", + "1 -- 110.0 695.0 485.0 \n", + "2 -- 78.0 656.0 332.0 \n", + "3 -- 87.0 643.0 214.0 \n", + "4 -- 53.0 605.0 267.0 \n", + ".. ... ... ... ... \n", + "281 \n", + "282 \n", + "283 \n", + "284 \n", + "285 \n", + "\n", + " Total disappearance 3 Ending stocks \n", + "0 1034.0 492.0 \n", + "1 1180.0 330.0 \n", + "2 988.0 672.0 \n", + "3 857.0 994.0 \n", + "4 872.0 1109.0 \n", + ".. ... ... \n", + "281 \n", + "282 \n", + "283 \n", + "284 \n", + "285 \n", + "\n", + "[286 rows x 13 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns = [name.replace(\"/\", \"\") for name in df.columns]\n", + "df" + ] + }, + { + "cell_type": "markdown", + "id": "5b169a83", + "metadata": {}, + "source": [ + "\n", + "\n", + "```\n", + "```\n", + "\n", + "\n", + "You can use pandas syntax to filter rows.\n", + "\n", + "\n", + "```\n", + "full_rows = df[~df['Beginning stocks'].isna()]\n", + "full_rows\n", + "```\n", + "\n", + "\n", + "\n", + "# Using BigQuery SQL magics (%%bqsql)\n", + "\n", + "The BigQuery DataFrames (aka BigFrames) library provides a %%bqsql magic, which can query local pandas or BigFrames DataFrames, as well as anything supported by the BigQuery query engine, such as parquet / iceberg, CSV files in GCS, and BigQuery tables. To enable the magic, use the %load_ext magic.\n", + "\n", + "\n", + "```\n", + "%load_ext bigframes\n", + "```\n", + "\n", + "\n", + "To ensure the correct Google Cloud project is billed for query usage, including free tier usage, configure the project ID used by the magics. If not set, the default project is discovered from your environment, such as the one associated with your application default credentials.\n", + "\n", + "\n", + "```\n", + "import bigframes.pandas as bpd\n", + "\n", + "bpd.options.bigquery.project = \"your-project-id\"\n", + "```\n", + "\n", + "\n", + "Now that the extension is loaded, you can use the %%bqsql magics to query the DataFrame created in the previous steps.\n", + "\n", + "\n", + "```\n", + "%%bqsql\n", + "SELECT * FROM {full_rows}\n", + "```\n", + "\n", + "\n", + "You should see the results from full_rows.\n", + "\n", + "\n", + "# Transforming the data with SQL\n", + "\n", + "The %%bqsql magics take a destination variable argument, which saves the results as a BigFrames DataFrame. This can be used to incrementally apply operations in both SQL and Python.\n", + "\n", + "First, limit the data to yearly data and save the results to the \"yearly\" variable.\n", + "\n", + "\n", + "```\n", + "%%bqsql yearly\n", + "SELECT *\n", + "FROM {full_rows}\n", + "WHERE STARTS_WITH(`Time period`, 'MY')\n", + "```\n", + "\n", + "\n", + "The \"Marketing year 1\" column isn't as useful as it could be because it is still a string. Transform it to a time series using SQL.\n", + "\n", + "\n", + "```\n", + "%%bqsql timeseries\n", + "SELECT\n", + " * EXCEPT (`Marketing year 1`),\n", + " TIMESTAMP(CONCAT(\n", + " REGEXP_EXTRACT(`Marketing year 1`, r'([0-9]+)\\/'),\n", + " '-01-01')) AS `year`\n", + "FROM {yearly}\n", + "```\n", + "\n", + "\n", + "\n", + "# Visualizing the data\n", + "\n", + "BigFrames supports most pandas operations, including several visualization methods. By setting a timestamp column to the index of the DataFrame, visualization becomes easier to understand.\n", + "\n", + "\n", + "```\n", + "timeseries.set_index('year').sort_index().plot.line()\n", + "```\n", + "\n", + "\n", + "Alternatively, convert the DataFrame to pandas for further integration with other libraries.\n", + "\n", + "\n", + "```\n", + "pddf = timeseries.set_index('year').sort_index().to_pandas()\n", + "pddf\n", + "```\n", + "\n", + "\n", + "\n", + "# Conclusion\n", + "\n", + "By leveraging BigFrames, you can combine the best of both worlds: the expressive power of SQL and the versatile ecosystem of Python. This approach not only improves readability but also provides the opportunity to scale your data processing to handle massive datasets directly in BigQuery by swapping out a pandas DataFrame in these examples with a BigQuery DataFrame.\n", + "\n", + "\n", + "# Next Steps\n", + "\n", + "Another way to use BigQuery features on pandas DataFrames is through the BigQuery pandas extension. For example, call any of the community BigQuery functions in [BigQuery Utils](https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs#bigquery-udfs), [BigFunctions](https://unytics.io/bigfunctions/bigfunctions/#function-categories), [CARTO Analytics Toolbox for BigQuery](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery), and more by using the DataFrame.bigquery.sql_scalar(...) accessor.\n", + "\n", + "\n", + "```\n", + "import bigframes.pandas as bpd # registers the bigquery accessor\n", + "import pandas as pd\n", + "\n", + "data = {\n", + " 'text1': [\n", + " 'apple',\n", + " 'banana',\n", + " 'orange',\n", + " 'grape',\n", + " 'strawberry',\n", + " 'blueberry',\n", + " 'raspberry',\n", + " 'pineapple'\n", + " ],\n", + " 'text2': [\n", + " 'aple',\n", + " 'bandana',\n", + " 'orenge',\n", + " 'grpe',\n", + " 'straaawberry',\n", + " 'bluebery',\n", + " 'rasery',\n", + " 'pinapple'\n", + " ]\n", + "}\n", + "\n", + "df = pd.DataFrame(data)\n", + "\n", + "bpd.options.bigquery.project = \"your-project-id\"\n", + "\n", + "df.bigquery.sql_scalar(\"bqutil.fn.cw_editdistance({text1}, {text2})\")\n", + "```\n", + "\n", + "\n", + "BigQuery sandbox offers powerful, scalable analytics, but some features aren't supported, such as BigQuery Machine Learning. Connect a billing account to your project to use powerful features such as the AI.FORECAST function, which can predict time series data using Google's foundational models.\n", + "\n", + "The BigFrames team would love to hear from you. If you would like to reach out, please send an email to: [bigframes-feedback@google.com](mailto:bigframes-feedback@google.com) or by filing an issue at the[ open source BigFrames repository](https://github.com/googleapis/google-cloud-python/issues). To receive updates about BigFrames, subscribe to the [BigFrames email list](https://docs.google.com/forms/d/10EnDyYdYUW9HvelHYuBRC8L3GdGVl3rX0aroinbRZyc/edit?resourcekey=0-QUsnpzF91gm9hsp04rSA6Q)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "56babb3f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2026-05-07 17:55:50-- https://www.ers.usda.gov/media/5706/wheat-data-all-years.xlsx?v=19753\n", + "Resolving www.ers.usda.gov (www.ers.usda.gov)... 20.141.137.224\n", + "Connecting to www.ers.usda.gov (www.ers.usda.gov)|20.141.137.224|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 814596 (796K) [application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]\n", + "Saving to: ‘/tmp/wheat-data.xlsx’\n", + "\n", + "/tmp/wheat-data.xls 100%[===================>] 795.50K 2.32MB/s in 0.3s \n", + "\n", + "2026-05-07 17:55:51 (2.32 MB/s) - ‘/tmp/wheat-data.xlsx’ saved [814596/814596]\n", + "\n" + ] + } + ], + "source": [ + "!wget -O /tmp/wheat-data.xlsx 'https://www.ers.usda.gov/media/5706/wheat-data-all-years.xlsx?v=19753'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "469a1b8e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Marketing year 1/Type 2/Beginning stocksProductionImportsTotal supply 3/ 4/Food useSeed useFeed and residual useTotal domestic use 4/ExportsTotal disappearance 4/Ending stocks
01950/51All wheat496.01019.0111526.0580.0--109.0689.0345.01034.0492.0
11951/52All wheat492.0988.0301510.0585.0--110.0695.0485.01180.0330.0
21952/53All wheat330.01306.0241660.0578.0--78.0656.0332.0988.0672.0
31953/54All wheat672.01173.061851.0556.0--87.0643.0214.0857.0994.0
41954/55All wheat994.0984.031981.0552.0--53.0605.0267.0872.01109.0
..........................................
2872/ Hard Red Winter, Hard Red Spring, Soft Red ...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2883/ Includes flour and selected other products ...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2894/ Totals may not add due to rounding.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
290Source: USDA, Economic Research Service, based...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
291Updated: April 10, 2026<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
\n", + "

292 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " Marketing year 1/ Type 2/ \\\n", + "0 1950/51 All wheat \n", + "1 1951/52 All wheat \n", + "2 1952/53 All wheat \n", + "3 1953/54 All wheat \n", + "4 1954/55 All wheat \n", + ".. ... ... \n", + "287 2/ Hard Red Winter, Hard Red Spring, Soft Red ... \n", + "288 3/ Includes flour and selected other products ... \n", + "289 4/ Totals may not add due to rounding. \n", + "290 Source: USDA, Economic Research Service, based... \n", + "291 Updated: April 10, 2026 \n", + "\n", + " Beginning stocks Production Imports Total supply 3/ 4/ Food use \\\n", + "0 496.0 1019.0 11 1526.0 580.0 \n", + "1 492.0 988.0 30 1510.0 585.0 \n", + "2 330.0 1306.0 24 1660.0 578.0 \n", + "3 672.0 1173.0 6 1851.0 556.0 \n", + "4 994.0 984.0 3 1981.0 552.0 \n", + ".. ... ... ... ... ... \n", + "287 \n", + "288 \n", + "289 \n", + "290 \n", + "291 \n", + "\n", + " Seed use Feed and residual use Total domestic use 4/ Exports \\\n", + "0 -- 109.0 689.0 345.0 \n", + "1 -- 110.0 695.0 485.0 \n", + "2 -- 78.0 656.0 332.0 \n", + "3 -- 87.0 643.0 214.0 \n", + "4 -- 53.0 605.0 267.0 \n", + ".. ... ... ... ... \n", + "287 \n", + "288 \n", + "289 \n", + "290 \n", + "291 \n", + "\n", + " Total disappearance 4/ Ending stocks \n", + "0 1034.0 492.0 \n", + "1 1180.0 330.0 \n", + "2 988.0 672.0 \n", + "3 857.0 994.0 \n", + "4 872.0 1109.0 \n", + ".. ... ... \n", + "287 \n", + "288 \n", + "289 \n", + "290 \n", + "291 \n", + "\n", + "[292 rows x 13 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "df = pd.read_excel(\n", + " \"/tmp/wheat-data.xlsx\",\n", + " sheet_name=\"Table06\",\n", + " header=1,\n", + "\n", + " # Requires that the python-calamine project is also installed.\n", + " engine=\"calamine\",\n", + "\n", + " # Recommended so that string columns don't contain NaN, which can confuse\n", + " # parquet serialization, which we use to read these data in BigQuery SQL.\n", + " dtype_backend=\"pyarrow\",\n", + ")\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "a06b58d9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Marketing year 1Type 2Beginning stocksProductionImportsTotal supply 3 4Food useSeed useFeed and residual useTotal domestic use 4ExportsTotal disappearance 4Ending stocks
rowindex
01950/51All wheat496.01019.0111526.0580.0--109.0689.0345.01034.0492.0
11951/52All wheat492.0988.0301510.0585.0--110.0695.0485.01180.0330.0
21952/53All wheat330.01306.0241660.0578.0--78.0656.0332.0988.0672.0
31953/54All wheat672.01173.061851.0556.0--87.0643.0214.0857.0994.0
41954/55All wheat994.0984.031981.0552.0--53.0605.0267.0872.01109.0
..........................................
2872/ Hard Red Winter, Hard Red Spring, Soft Red ...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2883/ Includes flour and selected other products ...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
2894/ Totals may not add due to rounding.<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
290Source: USDA, Economic Research Service, based...<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
291Updated: April 10, 2026<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
\n", + "

292 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " Marketing year 1 Type 2 \\\n", + "rowindex \n", + "0 1950/51 All wheat \n", + "1 1951/52 All wheat \n", + "2 1952/53 All wheat \n", + "3 1953/54 All wheat \n", + "4 1954/55 All wheat \n", + "... ... ... \n", + "287 2/ Hard Red Winter, Hard Red Spring, Soft Red ... \n", + "288 3/ Includes flour and selected other products ... \n", + "289 4/ Totals may not add due to rounding. \n", + "290 Source: USDA, Economic Research Service, based... \n", + "291 Updated: April 10, 2026 \n", + "\n", + " Beginning stocks Production Imports Total supply 3 4 Food use \\\n", + "rowindex \n", + "0 496.0 1019.0 11 1526.0 580.0 \n", + "1 492.0 988.0 30 1510.0 585.0 \n", + "2 330.0 1306.0 24 1660.0 578.0 \n", + "3 672.0 1173.0 6 1851.0 556.0 \n", + "4 994.0 984.0 3 1981.0 552.0 \n", + "... ... ... ... ... ... \n", + "287 \n", + "288 \n", + "289 \n", + "290 \n", + "291 \n", + "\n", + " Seed use Feed and residual use Total domestic use 4 Exports \\\n", + "rowindex \n", + "0 -- 109.0 689.0 345.0 \n", + "1 -- 110.0 695.0 485.0 \n", + "2 -- 78.0 656.0 332.0 \n", + "3 -- 87.0 643.0 214.0 \n", + "4 -- 53.0 605.0 267.0 \n", + "... ... ... ... ... \n", + "287 \n", + "288 \n", + "289 \n", + "290 \n", + "291 \n", + "\n", + " Total disappearance 4 Ending stocks \n", + "rowindex \n", + "0 1034.0 492.0 \n", + "1 1180.0 330.0 \n", + "2 988.0 672.0 \n", + "3 857.0 994.0 \n", + "4 872.0 1109.0 \n", + "... ... ... \n", + "287 \n", + "288 \n", + "289 \n", + "290 \n", + "291 \n", + "\n", + "[292 rows x 13 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# To aid in SQL authoring, rename the columns to avoid problematic special\n", + "# characters. Note: BigQuery supports some special characters, but not \"/\".\n", + "# https://docs.cloud.google.com/bigquery/docs/schemas#flexible-column-names\n", + "df_renamed = df.rename(\n", + " columns={\n", + " column: column.replace(\"/\", \"\")\n", + " for column in df.columns\n", + " }\n", + ")\n", + "\n", + "# Also, give a name to the index so that it can be included.\n", + "df_renamed.index.name = \"rowindex\"\n", + "df_renamed\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "04363cc5", + "metadata": {}, + "outputs": [], + "source": [ + "import bigframes.pandas as bpd\n", + "\n", + "# TODO(developer): Follow the instructions at\n", + "# https://docs.cloud.google.com/bigquery/docs/sandbox and set the project to the\n", + "# ID of the project you created.\n", + "bpd.options.bigquery.project = \"swena-bq-sandbox\"" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "3b9a46d7", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "UsageError: Cell magic `%%bqsql` not found.\n" + ] + } + ], + "source": [ + "%%bqsql df_with_year\n", + "SELECT\n", + " REGEXP_EXTRACT(LTRIM(`Marketing year 1`), r'^([0-9]+)/[0-9]') AS `Start year`,\n", + " SAFE_CAST(`Seed use` AS FLOAT64) AS `Seed use`,\n", + " * EXCEPT (`Marketing year 1`, `Seed use`)\n", + "FROM {df_renamed}\n", + "ORDER BY rowindex ASC" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a3c3e22", + "metadata": {}, + "outputs": [], + "source": [ + "%%bqsql use_proportions\n", + "SELECT\n", + " `rowindex`,\n", + " `Start year`,\n", + " `Seed use` / `Total disappearance 4` AS `Seed proportion`,\n", + " `Food use` / `Total disappearance 4` AS `Food proportion`,\n", + " `Feed and residual use` / `Total disappearance 4` AS `Feed proportion`,\n", + " `Exports` / `Total disappearance 4` AS `Exports proportion`\n", + "FROM {df_with_year}\n", + "WHERE TRIM(`Type 2`) = 'All wheat'\n", + "ORDER BY `rowindex` ASC;" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a158619", + "metadata": {}, + "outputs": [], + "source": [ + "use_proportions.set_index('Start year')[['Seed proportion', 'Food proportion', 'Feed proportion', 'Exports proportion']].plot.area(stacked=True, ylim=(0, 1.5), colormap='viridis')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ec316f8f", + "metadata": {}, + "outputs": [], + "source": [ + "type(use_proportions)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bf0e0b2", + "metadata": {}, + "outputs": [], + "source": [ + "pandas_result = use_proportions.to_pandas()\n", + "pandas_result" + ] + }, + { + "cell_type": "markdown", + "id": "7d8aa895", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.14.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}