[Store] Support SSD Metrics by LujhCoconut · Pull Request #1879 · kvcache-ai/Mooncake

LujhCoconut · 2026-04-13T08:08:35Z

Description

Add SSD offloading metrics to mooncake-store, enabling observability for SSD read/write performance including throughput (bytes/s), IOPS (ops/s), and exact P50/P90/P99 latency.

Currently, FileStorage::BatchLoad() and FileStorage::OffloadObjects() have timing code (VLOG), but no structured metrics are exported. This PR introduces a new SsdMetric struct (following the existing TransferMetric pattern) and instruments both the read (BatchLoad) and write (BatchOffload) paths. All metrics are reported in three dimensions: Read, Write, and Total (combined).

Exported Metrics (12 total)

Counters (6):

Metric	Description
`mooncake_ssd_read_bytes_total`	Cumulative bytes read from SSD
`mooncake_ssd_write_bytes_total`	Cumulative bytes written to SSD
`mooncake_ssd_total_bytes_total`	Cumulative bytes read + written (combined)
`mooncake_ssd_read_ops_total`	Cumulative key count read
`mooncake_ssd_write_ops_total`	Cumulative key count written
`mooncake_ssd_total_ops_total`	Cumulative key count read + written (combined)

Histograms (3, for Prometheus aggregation):

Metric	Description
`mooncake_ssd_read_latency_us`	Per-batch read latency (18 buckets: 50μs ~ 30s)
`mooncake_ssd_write_latency_us`	Per-batch write latency
`mooncake_ssd_total_latency_us`	Per-batch total latency (combined)

Summaries (3, for exact percentiles):

Metric	Description
`mooncake_ssd_read_latency_summary_us`	Read latency exact P50/P90/P99
`mooncake_ssd_write_latency_summary_us`	Write latency exact P50/P90/P99
`mooncake_ssd_total_latency_summary_us`	Total latency exact P50/P90/P99

Key Design Decisions

Client-side metrics: SSD I/O runs in the client process, so SsdMetric lives in ClientMetric, not MasterMetricManager.
Histogram + Summary dual approach: Histograms for Prometheus export (cross-instance aggregatable); Summaries for precise P50/P90/P99 with avg in human-readable logs.
Latency buckets tuned for SSD: 18 buckets from 50μs (high-end NVMe) to 30s (3fs/nfs large-object writes), distinct from the RDMA-tuned kLatencyBucket.
Three dimensions: Read, Write, and Total (combined) for all metrics, as per requirements.
Failed operations excluded: Only successful BatchLoad/BatchOffload calls are counted, keeping IOPS/throughput/latency clean.
Backward compatible: SsdMetric* defaults to nullptr; existing callers are unaffected.
Prometheus naming convention: All counters use _total suffix, consistent with master-side metrics (e.g., master_put_start_requests_total).

Changed Files (8 files, +382 / -3)

File	Change
`include/client_metric.h`	+212: `kSsdLatencyBucket`, `SsdMetric` struct (12 metrics, serialize, summary with throughput/IOPS/exact percentiles)
`include/client_service.h`	+4: `Client::GetSsdMetricPtr()` accessor
`include/file_storage.h`	+6: `SsdMetric*` constructor param, `ssd_metric_` member, forward decl
`src/client_metric.cpp`	+4: `ssd_metric` in constructor/serialize/summary
`src/file_storage.cpp`	+39: `BatchLoad()` read instrumentation, `OffloadObjects()` write instrumentation
`src/real_client.cpp`	+3: pass `client_->GetSsdMetricPtr()` to FileStorage
`tests/ssd_metrics_test.cpp`	New: 13 unit tests
`tests/file_storage_test.cpp`	+116: 3 integration tests
`tests/CMakeLists.txt`	+1: register `ssd_metrics_test`

Module

Mooncake Store (mooncake-store)

Type of Change

New feature

How Has This Been Tested?

Unit Tests (`ssd_metrics_test`, 13 test cases)

$ ./mooncake-store/tests/ssd_metrics_test
[==========] Running 13 tests from 1 test suite.
[  PASSED  ] 13 tests.

Test Case	Verifies
`InitialValuesTest`	All metrics initialize to 0
`ReadMetricsTest`	Read ops/bytes accumulate correctly across multiple BatchLoad calls
`WriteMetricsTest`	Write ops/bytes accumulate correctly, read metrics remain 0
`TotalMetricsTest`	`total_ops = read_ops + write_ops`, `total_bytes = read_bytes + write_bytes`, total latency histogram/summary have correct observation counts
`FailureNotCountedTest`	Failed operations do not increment any metric
`ConcurrentTest`	8 threads × 1000 ops, final counters (including totals) match expected values
`SerializeTest`	Prometheus output contains all 12 metric names (counters, histograms, summaries)
`SummaryMetricsTest`	Human-readable summary shows bytes, ops, throughput, IOPS, exact P50/P90/P99 with avg; no throughput/IOPS when data is 0
`ThroughputCalculationTest`	10MB read + 20MB write → 3 lines of `throughput=` and `IOPS=` (read, write, total)
`IntegrationWithClientMetric`	`SsdMetric` data appears in `ClientMetric::serialize()` and `summary_metrics()`
`SerializeWithDynamicLabels`	Labels (`instance_id`, `cluster_id`) propagate into serialized output
`LatencyBucketBoundaryTest`	Values at min bucket (50μs), max bucket (30s), and >max fall into correct buckets (+Inf)
`EmptyBatchMetricsTest`	Empty batch (0 ops, 0 bytes) does not crash; counters remain 0

Integration Tests (`file_storage_test`, 3 new test cases)

$ ./mooncake-store/tests/file_storage_test --gtest_filter='*SsdMetric*:*Ssd*:*Null*'
[==========] Running 3 tests from 1 test suite.
[  PASSED  ] 3 tests.

Test Case	Verifies
`BatchLoadRecordsSsdMetrics`	Write 100 keys (21000 bytes) via `BatchOffloadUtil`, then `BatchLoad` → `ssd_read_ops=100`, `ssd_read_bytes=21000`, latency has 1 observation
`BatchLoadFailureDoesNotRecordSsdMetrics`	`BatchLoad` with nonexistent keys → all metrics remain 0
`NullSsdMetricDoesNotCrash`	`FileStorage(nullptr)` completes `BatchLoad` without crash (backward compat)

Regression Tests

$ ./mooncake-store/tests/client_metrics_test   →  [PASSED] 7 tests.
$ ./mooncake-store/tests/file_storage_test     →  [PASSED] 20 tests.

Sample Output

summary_metrics() human-readable output:

=== SSD Metrics Summary ===
SSD Read: 10.00 MB, ops=1000, throughput=99.9 MB/s, IOPS=9992.9
SSD Write: 20.00 MB, ops=2000, throughput=199.9 MB/s, IOPS=19985.9
SSD Total: 30.00 MB, ops=3000, throughput=299.8 MB/s, IOPS=29978.8

=== SSD Latency Summary (microseconds) ===
Read: count=100, p50=500.0us, p90=2000.0us, p99=49664.0us, avg=6016.4us
Write: count=50, p50=1000.0us, p90=5000.0us, p99=20000.0us, avg=3200.0us
Total: count=150, p50=500.0us, p90=2000.0us, p99=49664.0us, avg=5079.5us

Prometheus serialized output (partial):

# HELP mooncake_ssd_read_bytes_total Total bytes read from SSD
# TYPE mooncake_ssd_read_bytes_total counter
mooncake_ssd_read_bytes_total 10485760

# HELP mooncake_ssd_read_latency_us SSD BatchLoad latency per batch (us)
# TYPE mooncake_ssd_read_latency_us histogram
mooncake_ssd_read_latency_us_bucket{le="50.000000"} 0
mooncake_ssd_read_latency_us_bucket{le="100.000000"} 0
...

# HELP mooncake_ssd_read_latency_summary_us SSD read latency quantiles (us)
# TYPE mooncake_ssd_read_latency_summary_us summary
mooncake_ssd_read_latency_summary_us{quantile="0.500000"} 500.0
mooncake_ssd_read_latency_summary_us{quantile="0.900000"} 2000.0
mooncake_ssd_read_latency_summary_us{quantile="0.990000"} 49664.0
mooncake_ssd_read_latency_summary_us_sum 601640
mooncake_ssd_read_latency_summary_us_count 100

Checklist

I have performed a self-review of my own code.
I have formatted my own code using ./scripts/code_format.sh before submitting.
I have updated the documentation.
I have added tests to prove my changes are effective.

gemini-code-assist

Code Review

This pull request introduces SSD performance metrics to the mooncake-store, including counters for bytes and operations along with latency histograms and summaries. The new SsdMetric struct is integrated into the ClientMetric and FileStorage classes to instrument the read and write paths. Review feedback identifies that the current write latency measurement in OffloadObjects likely only captures submission time for asynchronous backends and should be moved to a completion handler for accuracy. Additionally, the format_ssd_latency_summary method is noted as dead code with logic flaws and is recommended for removal.

gemini-code-assist · 2026-04-13T08:11:40Z

+        auto offload_start = std::chrono::steady_clock::now();
        auto offload_res = storage_backend_->BatchOffload(
            batch_object, complete_handler, eviction_handler);
+        if (offload_res && ssd_metric_) {
+            auto offload_elapsed_us =
+                std::chrono::duration_cast<std::chrono::microseconds>(
+                    std::chrono::steady_clock::now() - offload_start)
+                    .count();
+            int64_t total_bytes = 0;
+            for (const auto& [key, slices] : batch_object) {
+                for (const auto& slice : slices) {
+                    total_bytes += slice.size;
+                }
+            }
+            ssd_metric_->ssd_write_ops.inc(batch_object.size());
+            ssd_metric_->ssd_write_bytes.inc(total_bytes);
+            ssd_metric_->ssd_write_latency_us.observe(offload_elapsed_us);
+            ssd_metric_->ssd_write_latency_summary.observe(offload_elapsed_us);
+            ssd_metric_->ssd_total_ops.inc(batch_object.size());
+            ssd_metric_->ssd_total_bytes.inc(total_bytes);
+            ssd_metric_->ssd_total_latency_us.observe(offload_elapsed_us);
+            ssd_metric_->ssd_total_latency_summary.observe(offload_elapsed_us);
+        }


The instrumentation for SSD write metrics is currently placed immediately after the call to storage_backend_->BatchOffload. Since BatchOffload uses a CompletionHandler, it is likely an asynchronous operation in many backend implementations (such as BucketStorageBackend). Measuring latency here only captures the submission time, not the actual I/O duration. Furthermore, metrics are recorded based on submission success rather than actual write completion. It is recommended to move the metrics recording logic into a per-bucket completion handler that captures the start time and calculates the total bytes from the successful metadatas received in the callback. This will ensure that throughput, IOPS, and latency metrics accurately reflect successful SSD operations.

auto offload_start = std::chrono::steady_clock::now(); auto bucket_complete_handler = [this, offload_start, complete_handler]( const std::vector<std::string>& keys, std::vector<StorageObjectMetadata>& metadatas) { auto res = complete_handler(keys, metadatas); if (res == ErrorCode::OK && ssd_metric_) { auto elapsed_us = std::chrono::duration_cast<std::chrono::microseconds>( std::chrono::steady_clock::now() - offload_start) .count(); int64_t total_bytes = 0; for (const auto& metadata : metadatas) total_bytes += metadata.size; ssd_metric_->ssd_write_ops.inc(keys.size()); ssd_metric_->ssd_write_bytes.inc(total_bytes); ssd_metric_->ssd_write_latency_us.observe(elapsed_us); ssd_metric_->ssd_write_latency_summary.observe(elapsed_us); ssd_metric_->ssd_total_ops.inc(keys.size()); ssd_metric_->ssd_total_bytes.inc(total_bytes); ssd_metric_->ssd_total_latency_us.observe(elapsed_us); ssd_metric_->ssd_total_latency_summary.observe(elapsed_us); } return res; }; auto offload_res = storage_backend_->BatchOffload( batch_object, bucket_complete_handler, eviction_handler);

gemini-code-assist · 2026-04-13T08:11:40Z

+        if (bucket_counts.empty()) {
+            return "No data";
+        }
+
+        int64_t total_count = 0;
+        for (auto& bucket : bucket_counts) {
+            total_count += bucket->value();
+        }
+
+        if (total_count == 0) {
+            return "No data";
+        }
+
+        std::stringstream ss;
+        ss << "count=" << total_count;
+
+        // Find P50, P90, P99
+        auto find_percentile = [&](int pct) -> double {
+            int64_t target = (total_count * pct) / 100;
+            int64_t cumulative = 0;
+            for (size_t i = 0;
+                 i < bucket_counts.size() && i < kSsdLatencyBucket.size();
+                 i++) {
+                cumulative += bucket_counts[i]->value();
+                if (cumulative >= target) {
+                    return kSsdLatencyBucket[i];
+                }
+            }
+            return 0;
+        };
+
+        double p50 = find_percentile(50);
+        double p90 = find_percentile(90);
+        double p99 = find_percentile(99);
+
+        if (p50 > 0) ss << ", p50<" << p50 << "us";
+        if (p90 > 0) ss << ", p90<" << p90 << "us";
+        if (p99 > 0) ss << ", p99<" << p99 << "us";
+
+        return ss.str();
+    }
+
+    std::string format_summary_percentiles(ylt::metric::summary_t& summary) {
+        double sum = 0;


The format_ssd_latency_summary method is defined but never used within the SsdMetric struct or elsewhere in the codebase. Additionally, it contains a redundant const_cast on line 420 and its percentile calculation logic ignores the +Inf bucket, which could lead to inaccurate results if observations exceed the defined boundaries. Since summary_metrics() already uses the more accurate format_summary_percentiles based on summary_t, this dead code should be removed to improve maintainability.

codecov-commenter · 2026-04-13T09:32:24Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 94.31579% with 27 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
mooncake-store/src/file_storage.cpp	37.83%	23 Missing ⚠️
mooncake-store/include/client_service.h	0.00%	2 Missing ⚠️
mooncake-store/tests/ssd_metrics_test.cpp	99.26%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

staryxchen · 2026-04-13T12:21:24Z

@00fish0 On a separate note unrelated to this PR, I’d like to ask: Should a PR in the draft stage trigger CI?

00fish0 · 2026-04-13T13:04:23Z

@00fish0 On a separate note unrelated to this PR, I’d like to ask: Should a PR in the draft stage trigger CI?

Personally, I think we could go with a selective CI approach for drafts. We could trigger just one or two core build checks for Draft PRs to provide early feedback.

zhangzuo21 · 2026-04-14T08:17:42Z


+// SSD latency bucket: microseconds, tuned for SSD/network storage
+// Range: 50us (high-end NVMe) to 30s (3fs/nfs large object batch writes)
+const std::vector<double> kSsdLatencyBucket = {


kSsdLatencyBucket is defined in the header, so each .cpp that includes it gets its own copy. Consider adding inline to avoid duplication.

Thanks for your insightful suggestion. Fixed.

zhangzuo21 · 2026-04-14T08:18:53Z

LGTM, thanks for the contribution!

zhangzuo21 · 2026-04-14T08:32:27Z

@@ -0,0 +1,960 @@
+# Plan V2: Production-Ready OffsetAllocatorStorageBackend for SSD Offloading


Please delete the file.

zhangzuo21 · 2026-04-14T08:33:29Z


 # CodeBuddy Memory
 .codebuddy/
+


Why changing .gitignore?

zhangzuo21 · 2026-04-14T08:41:58Z

LGTM for me now.

github-actions bot added run-ci Store labels Apr 13, 2026

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

[Store] Support SSD Metrics

946e450

LujhCoconut force-pushed the add-feature-ssd-metrics branch from b97d8cc to 946e450 Compare April 13, 2026 08:58

ykwd requested a review from zhangzuo21 April 13, 2026 13:12

zhangzuo21 reviewed Apr 14, 2026

View reviewed changes

[Store] Support SSD Metrics w/ inline bug fixed

ea5702f

zhangzuo21 reviewed Apr 14, 2026

View reviewed changes

Comment thread .gitignore Outdated

# CodeBuddy Memory

.codebuddy/

Copy link
Copy Markdown

Collaborator

zhangzuo21 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing .gitignore?

LujhCoconut added 2 commits April 14, 2026 16:38

[Store] Support SSD Metrics w/ inline bug fixed

c577f51

[Store] Support SSD Metrics w/ inline bug fixed

bd844d9

LujhCoconut marked this pull request as ready for review April 14, 2026 08:42

LujhCoconut requested review from XucSh, YiXR, stmatengss and ykwd as code owners April 14, 2026 08:42

ykwd approved these changes Apr 15, 2026

View reviewed changes

ykwd merged commit c257340 into kvcache-ai:main Apr 15, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Store] Support SSD Metrics#1879

[Store] Support SSD Metrics#1879
ykwd merged 4 commits intokvcache-ai:mainfrom
LujhCoconut:add-feature-ssd-metrics

LujhCoconut commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

codecov-commenter commented Apr 13, 2026 •

edited

Loading

Uh oh!

staryxchen commented Apr 13, 2026

Uh oh!

00fish0 commented Apr 13, 2026

Uh oh!

zhangzuo21 Apr 14, 2026

Uh oh!

LujhCoconut Apr 14, 2026

Uh oh!

zhangzuo21 commented Apr 14, 2026

Uh oh!

zhangzuo21 Apr 14, 2026

Uh oh!

zhangzuo21 Apr 14, 2026

Uh oh!

zhangzuo21 commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		@@ -0,0 +1,960 @@
		# Plan V2: Production-Ready OffsetAllocatorStorageBackend for SSD Offloading

Conversation

LujhCoconut commented Apr 13, 2026

Description

Exported Metrics (12 total)

Key Design Decisions

Changed Files (8 files, +382 / -3)

Module

Type of Change

How Has This Been Tested?

Unit Tests (ssd_metrics_test, 13 test cases)

Integration Tests (file_storage_test, 3 new test cases)

Regression Tests

Sample Output

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

staryxchen commented Apr 13, 2026

Uh oh!

00fish0 commented Apr 13, 2026

Uh oh!

zhangzuo21 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

LujhCoconut Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

zhangzuo21 commented Apr 14, 2026

Uh oh!

zhangzuo21 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

zhangzuo21 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

zhangzuo21 commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Unit Tests (`ssd_metrics_test`, 13 test cases)

Integration Tests (`file_storage_test`, 3 new test cases)

codecov-commenter commented Apr 13, 2026 •

edited

Loading