Skip to content

feat(storage): integrate ACO tracing into public client and add system tests#17223

Draft
chandra-siri wants to merge 4 commits into
googleapis:mainfrom
chandra-siri:feat/gcs-aco-sdk-integration
Draft

feat(storage): integrate ACO tracing into public client and add system tests#17223
chandra-siri wants to merge 4 commits into
googleapis:mainfrom
chandra-siri:feat/gcs-aco-sdk-integration

Conversation

@chandra-siri
Copy link
Copy Markdown
Contributor

Description

The final piece of the App-centric Observability (ACO) tracing implementation. Wires up the metadata cache and tracing context managers to the public client interfaces and establishes a 100% sleep-free live system test suite.

Note

This PR is stacked on top of PR 3 (#17222). It should be reviewed/merged after PR 3.

Changes

  • Updated google/cloud/storage/client.py, bucket.py, blob.py, and transfer_manager.py to wrap critical operations (e.g. uploads, downloads, retrievals, deletes) in active trace spans and update the cache state on api calls/deletes.
  • Updated tests/unit/test_client.py and test_bucket.py with caching fallback tests.
  • Created tests/system/test_aco_observability.py with fully-deterministic concurrent system tests executing against a live GCS backend.

This is PR 4 of a 4-part series to break down the ACO tracing compatibility feature into reviewable slices.

@chandra-siri chandra-siri requested a review from a team as a code owner May 21, 2026 15:23
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an in-memory LRU cache for GCS bucket metadata to support App-centric Observability (ACO) by including destination attributes in OpenTelemetry spans. It adds a new BucketMetadataCache and a generic LRUCache implementation, updates the Client, Bucket, and Blob classes to manage cache state during operations, and modifies tracing logic to include cached metadata. Feedback focuses on improving performance and reliability, including suggestions to use a thread pool for background fetches, pre-compile regular expressions for bucket name extraction, fix incorrect log messages, and eliminate redundant URI parsing in download_blob_to_file.

Comment on lines +61 to +63
threading.Thread(
target=self._fetch_background, args=(bucket_name,), daemon=True
).start()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Spawning a new threading.Thread for every unique bucket cache miss could lead to excessive thread creation in applications that access many buckets. Consider using a concurrent.futures.ThreadPoolExecutor with a fixed number of workers to manage these background metadata fetches more efficiently.

Comment on lines +164 to +169
if name in (
"Storage.Client.getBucket",
"Storage.Client.lookupBucket",
"Storage.Bucket.reload",
"Storage.Bucket.exists",
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The exclusion list for background metadata fetches should include bucket creation and deletion operations. Triggering a background fetch during create is unnecessary as the bucket may not exist yet, and during delete it is redundant as the resource is being removed.

Suggested change
if name in (
"Storage.Client.getBucket",
"Storage.Client.lookupBucket",
"Storage.Bucket.reload",
"Storage.Bucket.exists",
):
if name in (
"Storage.Client.getBucket",
"Storage.Client.lookupBucket",
"Storage.Bucket.reload",
"Storage.Bucket.exists",
"Storage.Bucket.create",
"Storage.Bucket.delete",
):

HAS_OPENTELEMETRY,
)

logger = logging.getLogger(__name__)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve performance in the api_request hot path, pre-compile the bucket name extraction regex at the module level.

Suggested change
logger = logging.getLogger(__name__)
logger = logging.getLogger(__name__)
_BUCKET_NAME_RE = re.compile(r"/b/([^/?#]+)")

and hasattr(client, "_bucket_metadata_cache")
and client._bucket_metadata_cache
):
match = re.search(r"/b/([^/?#]+)", kwargs.get("path", ""))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the pre-compiled regex for bucket name extraction.

Suggested change
match = re.search(r"/b/([^/?#]+)", kwargs.get("path", ""))
match = _BUCKET_NAME_RE.search(kwargs.get("path", ""))

span_attributes["gcp.resource.destination.id"] = dest_id
span_attributes["gcp.resource.destination.location"] = loc
except Exception as e:
logger.debug(f"Failed cache.get_or_queue_fetch in api_request: {e}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message incorrectly references get_or_queue_fetch, but the code actually calls get on line 94. This could be confusing during debugging.

Suggested change
logger.debug(f"Failed cache.get_or_queue_fetch in api_request: {e}")
logger.debug(f"Failed cache.get in api_request: {e}")

and hasattr(client, "_bucket_metadata_cache")
and client._bucket_metadata_cache
):
match = re.search(r"/b/([^/?#]+)", kwargs.get("path", ""))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the pre-compiled regex for bucket name extraction in the error handler.

Suggested change
match = re.search(r"/b/([^/?#]+)", kwargs.get("path", ""))
match = _BUCKET_NAME_RE.search(kwargs.get("path", ""))

Comment on lines +1321 to 1337
bucket_name = None
if isinstance(blob_or_uri, Blob):
bucket_name = blob_or_uri.bucket.name if blob_or_uri.bucket else None
elif isinstance(blob_or_uri, str) and blob_or_uri.startswith("gs://"):
try:
temp_blob = Blob.from_uri(blob_or_uri)
bucket_name = temp_blob.bucket.name
except Exception:
pass

with create_trace_span_helper(
self,
bucket_name,
name="Storage.Client.downloadBlobToFile",
):
if not isinstance(blob_or_uri, Blob):
blob_or_uri = Blob.from_uri(blob_or_uri)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In download_blob_to_file, Blob.from_uri is called twice when a gs:// URI string is provided. You can optimize this by reusing the blob object created during the bucket name extraction. Additionally, ensure that the exception block logs the error instead of silently passing to aid in debugging.

Suggested change
bucket_name = None
if isinstance(blob_or_uri, Blob):
bucket_name = blob_or_uri.bucket.name if blob_or_uri.bucket else None
elif isinstance(blob_or_uri, str) and blob_or_uri.startswith("gs://"):
try:
temp_blob = Blob.from_uri(blob_or_uri)
bucket_name = temp_blob.bucket.name
except Exception:
pass
with create_trace_span_helper(
self,
bucket_name,
name="Storage.Client.downloadBlobToFile",
):
if not isinstance(blob_or_uri, Blob):
blob_or_uri = Blob.from_uri(blob_or_uri)
bucket_name = None
if isinstance(blob_or_uri, Blob):
bucket_name = blob_or_uri.bucket.name if blob_or_uri.bucket else None
elif isinstance(blob_or_uri, str) and blob_or_uri.startswith("gs://"):
try:
blob_or_uri = Blob.from_uri(blob_or_uri)
bucket_name = blob_or_uri.bucket.name
except Exception as e:
logger.warning(f"Failed to parse URI {blob_or_uri}: {e}")
with create_trace_span_helper(
self,
bucket_name,
name="Storage.Client.downloadBlobToFile",
):
if not isinstance(blob_or_uri, Blob):
blob_or_uri = Blob.from_uri(blob_or_uri)
References
  1. Avoid broad except Exception: blocks that silently return None or pass. Instead, log the exception to aid in debugging and prevent masking underlying issues.
  2. Remove duplicate lines of code to keep the codebase clean and avoid redundancy.

@parthea parthea marked this pull request as draft May 21, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant