Machine Learning Operations Playbook Adoption Workshop – Phase 1: Network Configuration and VPC Setup - Hands-On Workshop
This week focuses on network configuration and VPC setup for ML platforms. AWS labs provide high-level exploration, while Google Cloud labs deliver deep technical implementation.
- AWS hands on demonstrated by presenter. Attendees do not require access.
- Google Cloud Console access with Network Admin role
- Cloud Workstations
- Vertex AI API enabled
- Billing account configured
Objective: Understand how Amazon SageMaker integrates with Amazon VPCs to isolate ML workloads and enable secure communication with AWS services.
- AWS Console access with SageMaker and VPC permissions
- No resource creation required (exploration-only)
- Familiarity with basic VPC concepts: subnets, route tables, security groups
- SageMaker domains can be configured to run in VPC-only mode for full network isolation
- You must specify subnets, security groups, and optionally VPC endpoints for services like S3 and CloudWatch
- VPC-only mode disables public internet access and routes all traffic through your defined VPC
- Go to Amazon SageMaker Console
- Click Domains → Explore domain settings
- Under Network
- Observe required fields:
- VPC ID
- Subnet IDs (must be private)
- Security Group IDs
- Open Amazon VPC Console
- Review existing VPCs and subnets
- Notes of SageMaker domain network configuration
- List of required VPC components for VPC-only mode
Objective: Explore how security groups and network ACLs control traffic to ML workloads in Amazon VPCs.
- AWS Console access with EC2 and VPC permissions
- No instance launch required
- Security Groups: Stateful, instance-level firewalls
- Network ACLs: Stateless, subnet-level firewalls
- Security groups allow inbound/outbound rules per protocol and port
- Network ACLs allow and deny rules based on CIDR, protocol, and port
- Go to EC2 Console
- Click Security Groups → Select any group
- Note rules for ports like 22 (SSH), 443 (HTTPS), 8888 (Jupyter)
- Observe CIDR ranges and protocol types
- Go to VPC Console
- Click Network ACLs → Select any ACL
- Note allow/deny rules
- Observe stateless behavior (return traffic must be explicitly allowed)
- Table comparing security group vs network ACL rules
- Notes on rule behavior and ML workload implications
Objective: Explore how VPC interface endpoints enable private connectivity to SageMaker APIs and runtimes using AWS PrivateLink.
- AWS Console access with VPC and SageMaker permissions
- No endpoint creation required
- VPC endpoints allow private access to AWS services without internet exposure
- SageMaker supports interface endpoints via AWS PrivateLink
- Private DNS can be enabled to resolve SageMaker endpoints internally
-
Go to VPC Console
-
Click Endpoints → Create Endpoint
- Search for services:
-com.amazonaws.region.sagemaker.api
Objective: Design and configure a secure VPC environment that enables Vertex AI pipeline components to access Google Cloud Storage and BigQuery using Private Google Access—without relying on external IPs or public internet routing.
- Google Cloud Console access with Compute Network Admin and Vertex AI Admin roles
- Vertex AI API enabled
- Cloud Workstations
- Existing Vertex AI pipeline deployed
- Private Google Access allows managed resources in private subnets to access Google APIs (e.g., GCS, BigQuery) without external IPs
- Vertex AI pipeline components often run on managed infrastructure that uses internal IPs only
- Enabling Private Google Access ensures secure, compliant access to datasets, models, and metadata stored in GCS and BigQuery
- This configuration is critical for components like
preprocess_data_op,train_model_op,evaluate_model_op,model_approved_op, andregister_model_op - The only component that does not require GCS access is
model_rejected_op
These labs teach you how to configure Private Google Access so your Vertex AI pipeline components can securely access Cloud Storage and BigQuery without using the public internet.
| Component | What It Does | Needs GCS/BigQuery? |
|---|---|---|
preprocess_data_op |
Downloads raw data | ✅ Yes - Direct GCS access |
train_model_op |
Trains the model | ✅ Yes - Reads/writes artifacts |
evaluate_model_op |
Evaluates model performance | ✅ Yes - Reads data, writes metrics |
model_approved_op |
Logs approved models | ✅ Yes - Reads model metadata |
register_model_op |
Registers model version | ✅ Yes - Reads artifact paths |
model_rejected_op |
Logs rejection | ❌ No - Only logging |
Duration: 45 minutes
Objective: Build a private VPC that lets your existing Vertex AI pipeline components reach Cloud Storage and BigQuery over Google's internal network by enabling Private Google Access.
- Project Owner or Compute Network Admin & Vertex AI Admin
- Vertex AI, Cloud Storage & BigQuery APIs enabled
- Cloud Shell or local
gcloudCLI authenticated - An existing Vertex AI pipeline deployed (or pipeline spec in hand)
- Pipeline service account granted
roles/storage.objectViewer/Adminand BigQuery Data Viewer/Editor
Private Google Access allows managed resources and managed infrastructure in a private subnet (no external IP) to call Google APIs (e.g., storage.googleapis.com, bigquery.googleapis.com) on Google's private backbone.
Vertex AI pipeline steps run on managed VMs or GKE pods without external IPs. Enabling Private Google Access ensures your pipeline can:
- Download raw data (
preprocess_data_op) - Read/write model artifacts (
train_model_op,evaluate_model_op,model_approved_op,register_model_op) - Query BigQuery datasets for features or logging
The only step that does not need GCS/BigQuery is model_rejected_op (logging only).
-
Navigate to Google Cloud Console: https://console.cloud.google.com and sign-in using your sysco mlops user.
-
Use the Project Picker to select your project
- Enter Workstations in the search bar.
-
Select Create workstation
-
Enter a unique display name
-
Select test-configuration
-
In the configuration field drop down, select test-configuration
-
Select Create. Note: Creation may take several minutes to complete.
-
Select Start, located in the All workstations section, below the Quick actions column. Note: Creation may take several minutes to complete.
- Select Launch, afterwards, using the new workstation select the menu icon to access options, select terminal from the options.
-
Review the terminal area.
-
Run:
gcloud auth login -
Select the clickable link. Afterwards, select Open, upon selection a new browser session will start. Follow the prompts in the new session to login and get a verification code.
-
Select Continue
-
Follow the prompts and provide username or password if required.
-
Select Copy. Note: The credential is a verfication code.
-
Paste the verification code into the terminal
-
Run:
gcloud config set project mfav2-374520
Important: The presenter will create a VPC. Participants will create subnets and enable private google access.
gcloud compute networks create vertex-ai-vpc \
--subnet-mode=customWhat this does:
- Creates a new VPC network named
vertex-ai-vpc - Uses
customsubnet mode for full control over IP ranges - No subnets are created automatically
For Training Participants: Each participant will create their own unique subnet. Use your assigned participant number (1-40) in the commands below.
# Replace XX with your participant number (01-40)
# For example: participant 1 uses "01", participant 15 uses "15"
gcloud compute networks subnets create vertex-ai-subnet-participant-XX \
--network=vertex-ai-vpc \
--region=us-east1 \
--range=10.10.XX.0/24 \
--enable-private-ip-google-accessParticipant IP Range Assignments:
| Participant | Subnet Name | IP Range | Available IPs |
|---|---|---|---|
| 01 | vertex-ai-subnet-participant-01 | 10.10.1.0/24 | 254 |
| 02 | vertex-ai-subnet-participant-02 | 10.10.2.0/24 | 254 |
| 03 | vertex-ai-subnet-participant-03 | 10.10.3.0/24 | 254 |
| ... | ... | ... | ... |
| 15 | vertex-ai-subnet-participant-15 | 10.10.15.0/24 | 254 |
| ... | ... | ... | ... |
| 40 | vertex-ai-subnet-participant-40 | 10.10.40.0/24 | 254 |
Example for Participant 7:
gcloud compute networks subnets create vertex-ai-subnet-participant-07 \
--network=vertex-ai-vpc \
--region=us-east1 \
--range=10.10.7.0/24 \
--enable-private-ip-google-accessWhat this does:
- Creates a unique subnet for each participant in the shared VPC
- Each participant gets their own /24 subnet (254 usable IPs)
- Non-overlapping IP ranges prevent conflicts
- Enables Private Google Access - the key setting for this lab!
Important Notes:
- The VPC (
vertex-ai-vpc) is shared by all participants - Each participant creates and manages their own subnet
- IP ranges are pre-assigned to avoid conflicts
- All subnets have Private Google Access enabled
gcloud compute networks subnets describe vertex-ai-subnet-participant-XX \
--region=us-east1 \
--format="get(privateIpGoogleAccess)"Expected output: True
This confirms that VMs in this subnet can reach Google APIs without external IPs.
Important: Presenter will demonstrate how the ML pipeline orchestration utilizes your private subnetwork with private google access.
- Targeted Training Repo for DS/ML/MLOPS: ~/MLOPS-Engineering/Feature-Branch/.github/workflows/vertex-ai-cicd.yml
| Component | GCS Access | BigQuery Access | Notes |
|---|---|---|---|
preprocess_data_op |
Direct | Optional | Uses google-cloud-storage client |
train_model_op |
Indirect | Optional | Kubeflow reads/writes model artifacts |
evaluate_model_op |
Indirect | Optional | Reads dataset/model, writes metrics |
model_approved_op |
Indirect | Optional | Reads model URI for logging |
register_model_op |
Indirect | Optional | Reads artifact URI for Model Registry |
model_rejected_op |
None | None | Logging only; no GCS/BigQuery interaction |
- ✅ Screenshot showing
privateIpGoogleAccess=true
- Private Google Access FAQ: https://cloud.google.com/vpc/docs/private-google-access#faq
Objective: Lock down egress on your Vertex AI pipeline’s VPC so it can still reach Google APIs (Cloud Storage, BigQuery) and the metadata server, while blocking all other outbound traffic.
-
Compute Security Admin role granted
-
Completion of Lab 4.4 (custom subnet with Private Google Access)
-
Cloud Shell or local gcloud CLI authenticated
-
Google Cloud firewalls are stateful and enforced at the VPC level.
-
By default, all egress is allowed, which increases risk.
-
For least-privilege, restrict egress to only:
-
Google APIs (*.googleapis.com over TCP/443)
-
Metadata server (169.254.169.254 for token exchange)
-
DNS (udp/tcp:53) for name resolution
3.1 (Optional) Tag Your Pipeline Infrastructure
- Governance which includes tagging is covered later in the 32 week course.
Note: In the next steps replace the variables "XX" and "X" with your participant number. For example, participant-07 and --source-ranges=10.10.7.0/24.
3.2 Allow Egress to Google APIs
gcloud compute firewall-rules create allow-egress-google-apis-participant-XX \
--network=vertex-ai-vpc \
--direction=EGRESS \
--action=ALLOW \
--rules=tcp:443 \
--destination-ranges=199.36.153.4/30,199.36.153.8/29 \
--source-ranges=10.10.X.0/24 \
--priority=1000 \
--description="Allow egress to Google APIs from participant XX"3.3 Allow Egress to Metadata Server
gcloud compute firewall-rules create allow-egress-metadata-XX \
--network=vertex-ai-vpc \
--direction=EGRESS \
--action=ALLOW \
--rules=tcp:80 \
--destination-ranges=169.254.169.254/32 \
--source-ranges=10.10.X.0/24 \
--priority=900 \
--description="Allow egress to GCE metadata server from participant XX"3.4 Allow Egress for DNS Resolution
gcloud compute firewall-rules create allow-egress-dns-XX \
--network=vertex-ai-vpc \
--direction=EGRESS \
--action=ALLOW \
--rules=udp:53,tcp:53 \
--destination-ranges=0.0.0.0/0 \
--source-ranges=10.10.X.0/24 \
--priority=800 \
--description="Allow DNS resolution from participant XX"3.5 Deny All Other Egress
gcloud compute firewall-rules create deny-egress-all-participant-XX \
--network=vertex-ai-vpc \
--direction=EGRESS \
--action=DENY \
--rules=all \
--destination-ranges=0.0.0.0/0 \
--source-ranges=10.10.X.0/24 \
--priority=65534 \
--description="Deny all other outbound traffic from participant XX"⚠️ Warning: This denies everything else. Ensure your pipeline only needs the above services. This will also DENY python pip!
List of created firewall rules with priorities and descriptions
Firewall Rules Overview: https://cloud.google.com/vpc/docs/firewalls
Securing ML Workloads: https://cloud.google.com/architecture/ml-secure-networking
- Labs 4.4 & 4.5 completed
- Pipeline configured to use vertex-ai-vpc
-
What Works with Private Google Access Alone
-
✅ Cloud Storage (storage.googleapis.com)
-
✅ BigQuery (bigquery.googleapis.com)
-
✅ Vertex AI APIs (aiplatform.googleapis.com)
-
✅ Most Google Cloud APIs
When You Need Service Networking (VPC Peering)
- ❌ Cloud SQL (managed database instances)
- ❌ Memorystore (managed Redis/Memcached)
- ❌ Other managed services with dedicated instances
Private Google Access and Private Services Access both keep traffic off the public internet, but they operate at different layers and serve different use cases.
- Enabled on a subnet with
--enable-private-ip-google-access - Allows ML pipeline components and managed services in a private subnet to call Google APIs (Cloud Storage, BigQuery, Vertex AI) over Google’s internal network
- No peering or IP allocation required
- Covers “public” Google services endpoints like
storage.googleapis.comandbigquery.googleapis.com
- Creates a dedicated VPC peering connection to a Google-managed service network (service producer)
- Requires you to:
- Reserve an internal IP range for the service producer
- Use the Service Networking API to establish peering
- Enables private-IP connectivity to managed services with private-instance backends (Cloud SQL, Memorystore, GKE private control planes)
- Traffic remains on Google’s backbone but uses internal RFC-1918 addresses
-
Private Services Access leverages the same VPC peering mechanism, but:
-
The “producer” VPC is owned and managed by Google for a specific service
-
You don’t manually create peering routes; Service Networking automates route and tenancy-unit setup
-
Peering is one-way from your VPC to the service producer’s VPC, but appears in your network as a standard peering connection
| Capability | Use Case |
|---|---|
| Private Google Access | Access Cloud Storage, BigQuery, Vertex AI APIs without external IPs |
| Private Services Access | Access private-IP services (Cloud SQL, Memorystore, GKE private) |
- Service Networking API enabled:
gcloud services enable servicenetworking.googleapis.com| Participant | Range Name | IP Block |
|---|---|---|
| 01 | memorystore-psa-range-01 | 10.20.1.0/24 |
| 02 | memorystore-psa-range-02 | 10.20.2.0/24 |
| ... | ... | ... |
| 40 | memorystore-psa-range-40 | 10.20.40.0/24 |
- Reserve a /16 range in your VPC for Memorystore’s private endpoints:
gcloud compute addresses create memorystore-psa-range-XX \
--global \
--purpose=VPC_PEERING \
--addresses=10.20.X.0 \
--prefix-length=24 \
--network=vertex-ai-vpc \
--description="Participant XX IP range for Private Services Access to Memorystore"- The command with --prefix-length=24 lets Google Cloud automatically allocate a free /24 block from the default internal ranges
- To explicitly assign a known IP block like 10.20.7.0/24, each participant must use the --addresses flag
- --prefix-length=24 defines the size of the block (256 IPs)
- Ensures predictable, non-overlapping IP allocation across participants
-
Creates a global internal IP range for VPC peering with Google’s service producer network.
-
Scopes it to vertex-ai-vpc, ensuring the participant’s ML VPC can privately connect to Memorystore or Cloud SQL, etc.
-
Uses a /24 block, which is ideal for isolated environments with up to 256 IPs to support multiple instances running managed services per participant.
- Presenter Hands-On Example
gcloud services vpc-peerings connect \
--service=servicenetworking.googleapis.com \
--network=vertex-ai-vpc \
--ranges=memorystore-psa-range-01- Participant Hands-On
- Peer your VPC to the Memorystore service producer:
gcloud services vpc-peerings update \
--service=servicenetworking.googleapis.com \
--network=vertex-ai-vpc \
--ranges=memorystore-psa-range-XX,memorystore-psa-range-01- List peering connections on your VPC:
gcloud compute networks peerings list \
--network=vertex-ai-vpc- Expected state: ACTIVE
-
Only one private connection is needed per VPC, even if multiple services or IP ranges are used.
-
The service producer network is dedicated per project, and your VPC peers into it using the Service Networking API.
-
Private Google Access is subnet-level and enables access to Google APIs.
-
Private Services Access is VPC-level and enables access to private IP services like Cloud SQL or Memorystore.
[1] Private services access | VPC | Google Cloud: https://cloud.google.com/vpc/docs/private-services-access
[1] Memorystore Private IP Documentation: https://cloud.google.com/memorystore/docs/redis/ip-addresses-private





