Skip to content

[DCP - Terraform] Initializes the APIs, Service Account, Cloud Run Service and Spanner Instance and DB.#15

Merged
gmechali merged 7 commits intodatacommonsorg:mainfrom
gmechali:terraform
Mar 27, 2026
Merged

[DCP - Terraform] Initializes the APIs, Service Account, Cloud Run Service and Spanner Instance and DB.#15
gmechali merged 7 commits intodatacommonsorg:mainfrom
gmechali:terraform

Conversation

@gmechali
Copy link
Copy Markdown
Contributor

@gmechali gmechali commented Feb 25, 2026

This group of terraform script controls the setup of a new Data Commons Platform deployment within a GCP Project that has nothing setup.

  • Enables the Spanner, Cloud Run and IAM APIs
  • Creates a Service Account, with databaseUser permission
  • Creates a Cloud Run Service, deploying the :latest datacommons-platform image and grants it AllUsers Invoker permissions
  • Creates a Spanner Instance and DB

Note that after running this in datcom-website-dev, we succeeded with each of the following resources:

  • DCP Service Account - 🔗
  • DCP Cloud Run Service - 🔗
  • Spanner Instance 🔗 and DB 🔗

And after running the commands in the DCP setup: https://github.com/datacommonsorg/datacommons?tab=readme-ov-file#2-define-your-schema but changing the URL to the new Cloud Run Service, you can inspect the logs to find the requests, and inspect the DB to find the schema and nodes successfully saved!

Note - I have added more optional variables to control the Spanner DB + cloud run service setup but have not yet tested them. Thoughts on including those?

Lastly, it adds a setup script to create the GCS bucket for remote state management.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @gmechali, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive set of Terraform scripts designed to automate the initial setup and deployment of the Data Commons Platform (DCP) within a new Google Cloud Project. The changes streamline the provisioning of core GCP services, including API enablement, service account creation with appropriate permissions, Cloud Run service deployment, and Spanner instance and database setup, ensuring a consistent and repeatable infrastructure foundation for DCP.

Highlights

  • Terraform Infrastructure Setup: Introduced a new set of Terraform scripts to provision and deploy the Data Commons Platform (DCP) on Google Cloud, enabling a streamlined and repeatable infrastructure setup.
  • GCP API Enablement: Automated the enablement of essential Google Cloud APIs, specifically Spanner, Cloud Run, and IAM, as part of the initial deployment process.
  • Service Account and IAM Configuration: Created a dedicated Service Account for the DCP Cloud Run service and granted it the necessary databaseUser permissions for Spanner, along with AllUsers Invoker permissions for the Cloud Run service itself.
  • Cloud Run Service Deployment: Configured and deployed the datacommons-platform Docker image to a new Cloud Run service, including settings for scaling, resources, and environment variables.
  • Spanner Instance and Database Provisioning: Added Terraform resources to conditionally create a Google Cloud Spanner instance and database, with configurable options for processing units and instance/database IDs.
  • Documentation and Configuration: Provided a detailed README.md for the new Terraform setup, outlining prerequisites, authentication, environment configuration, deployment steps, and architecture overview. Also added comprehensive input variables for customization.
Changelog
  • .gitignore
    • Added entries to ignore Terraform state files, .env configuration, and Python virtual environment artifacts.
  • build/Dockerfile
    • Modified the CMD instruction to explicitly include the start command for the datacommons api entrypoint.
  • infra/dcp/README.md
    • Added a new README providing detailed instructions for setting up, deploying, and tearing down the Data Commons Platform infrastructure using Terraform.
  • infra/dcp/cloudrun.tf
    • Added Terraform resource definition for the google_cloud_run_v2_service, configuring the DCP Cloud Run service with image, scaling, resources, and environment variables.
  • infra/dcp/iam.tf
    • Added Terraform resources for creating a google_service_account for the Cloud Run service and granting it roles/spanner.databaseUser and roles/run.invoker permissions.
  • infra/dcp/main.tf
    • Added core Terraform configuration, including required Google provider version and resources to enable run.googleapis.com, spanner.googleapis.com, and iam.googleapis.com APIs.
  • infra/dcp/outputs.tf
    • Added Terraform output variables to expose the Cloud Run service URL, service account email, and Spanner instance/database IDs.
  • infra/dcp/spanner.tf
    • Added Terraform resources for conditionally creating a google_spanner_instance and google_spanner_database based on input variables.
  • infra/dcp/variables.tf
    • Added comprehensive Terraform variable definitions for project ID, region, image URL, service names, Spanner configuration, and Cloud Run settings (CPU, memory, scaling, concurrency, timeout).
Activity
  • No human activity (comments, reviews, etc.) has been recorded yet for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Terraform scripts for setting up the Data Commons Platform on GCP, including enabling APIs, creating a service account, and provisioning Cloud Run and Spanner resources. The primary security concern identified is the explicit granting of public access to the Cloud Run service via the allUsers IAM binding, which bypasses standard IAM-based access controls and requires careful evaluation. Other issues include the need to scope down IAM permissions, make public access and database deletion protection configurable (especially for production environments), correct the .gitignore for reproducible builds, and avoid the use of the :latest Docker image tag.

Comment thread .gitignore Outdated
Comment thread infra/dcp/spanner.tf Outdated
Comment thread infra/dcp/modules/dcp/iam.tf
Comment thread infra/dcp/modules/dcp/iam.tf
Comment thread infra/dcp/variables.tf Outdated
variable "image_url" {
description = "Docker image URL to deploy"
type = string
default = "gcr.io/datcom-ci/datacommons-platform:latest"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default image_url uses the :latest tag. Using mutable tags like :latest is not recommended for deployments as it can lead to unexpected code being deployed if the image is updated. This makes deployments less predictable and rollbacks harder. It is a best practice to use immutable image tags, such as a git commit SHA or a semantic version number (e.g., gcr.io/datcom-ci/datacommons-platform:v1.2.3 or gcr.io/datcom-ci/datacommons-platform:sha-a1b2c3d).

Comment thread infra/dcp/variables.tf Outdated
Comment thread infra/dcp/variables.tf Outdated
Comment thread infra/dcp/Makefile Outdated
Comment thread infra/dcp/terraform.tfvars.example Outdated
Comment thread infra/dcp/terraform.tfvars.example Outdated
Comment thread infra/dcp/variables.tf Outdated
Comment thread infra/dcp/variables.tf Outdated
Comment thread infra/dcp/.env.example Outdated
Comment thread infra/dcp/.terraform.lock.hcl Outdated
Comment thread infra/dcp/terraform.tfvars.example Outdated
Comment thread infra/dcp/modules/cdc/locals.tf
@gmechali gmechali requested a review from dwnoble March 10, 2026 23:19
Comment thread infra/dcp/README.md
* **Terraform**: Terraform installed locally (>= 1.0.0).
* **gcloud CLI**: GCP CLI installed and authenticated.

## Setup
Copy link
Copy Markdown
Contributor

@clincoln8 clincoln8 Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do users need to clone the repo? Is there a command/script we can provide such that they might pull the required files with curl instead having to clone? Curious to hear your general thoughts on this / what's feasible.

Like maybe we have a script that does the pulling of files. So user flow might look like:

  1. curl / pull "download_script"
  2. run "download_script" which downloads the rest of the required files
  3. edit variable file
  4. run "setup_script"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a conversation about where this should live. For now I ll get this checked in here, but it's an ongoing conversation of whether we will move it to a Terraform Data Commons Repo.

This PR is pretty isolated so it will be very simple to refactor. I think we will get the conversationsettled shortly :)

Copy link
Copy Markdown
Contributor

@dwnoble dwnoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gabe! I had to make a few changes to the tf configuration to get it working on my end, but with these changes my cdc+dcp stacks are up and running

Comment thread infra/dcp/setup.sh
Comment thread infra/dcp/terraform.tfvars.example Outdated
Comment thread infra/dcp/terraform.tfvars.example Outdated
Comment thread infra/dcp/modules/cdc/main.tf Outdated
Comment thread infra/dcp/modules/cdc/variables.tf Outdated
Comment thread infra/dcp/modules/cdc/main.tf Outdated
Comment thread infra/dcp/variables.tf Outdated
Comment thread infra/dcp/modules/cdc/main.tf
Comment thread infra/dcp/modules/cdc/main.tf Outdated
Comment thread infra/dcp/modules/cdc/main.tf
@gmechali gmechali requested a review from dwnoble March 25, 2026 19:28
Comment thread infra/dcp/modules/cdc/main.tf
@gmechali gmechali requested a review from dwnoble March 26, 2026 17:34
Comment thread infra/dcp/modules/dcp/iam.tf
@gmechali gmechali added this pull request to the merge queue Mar 27, 2026
Merged via the queue into datacommonsorg:main with commit be621cd Mar 27, 2026
3 checks passed
@gmechali gmechali deleted the terraform branch March 27, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants