Skip to content

Adding support for prefix queries#1870

Open
maheshrbapatu wants to merge 5 commits intokvcache-ai:mainfrom
maheshrbapatu:prefix_support_phase1
Open

Adding support for prefix queries#1870
maheshrbapatu wants to merge 5 commits intokvcache-ai:mainfrom
maheshrbapatu:prefix_support_phase1

Conversation

@maheshrbapatu
Copy link
Copy Markdown
Collaborator

@maheshrbapatu maheshrbapatu commented Apr 12, 2026

Description

Introduce a compressed Patricia-style radix tree over string keys to support efficient prefix-based enumeration and aggregation.

The trie is implemented as an independent data structure and is not yet integrated with MasterService or the existing hash-sharded metadata index.

Supports:

  • insert/remove of full string keys with automatic node cleanup
  • listing keys and lightweight leaf data by prefix
  • handling prefixes ending mid-edge (compressed trie semantics)
  • prefix continuation listing for hierarchical browsing
  • subtree aggregation (count, bytes) via maintained statistics

Uses coarse-grained locking to provide thread-safe concurrent reads and writes.

Stores lightweight per-terminal-node summary data (TrieLeafData) to keep scope decoupled from full metadata lifecycle.

Includes unit tests covering:

  • insert/remove and prefix query correctness
  • edge cases (partial edge splits, prefix-as-key, mid-edge traversal)
  • subtree statistic invariants (count/bytes consistency)
  • cleanup behavior after deletes
  • concurrent access patterns

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • Mooncake EP (mooncake-ep)
  • Integration (mooncake-integration)
  • P2P Store (mooncake-p2p-store)
  • Python Wheel (mooncake-wheel)
  • PyTorch Backend (mooncake-pg)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Breaking change
  • Documentation update
  • Other

How Has This Been Tested?

Added a bunch of new UTs.

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • [] I have updated the documentation.
  • I have added tests to prove my changes are effective.

Ref #1732

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a PrefixIndex class, implemented as a compressed radix tree, to support efficient prefix-based operations. However, the PR contains critical issues: the CMakeLists.txt files have been severely misformatted with spaces in file paths, which will break the build. Additionally, the PrefixIndex implementation contains unreachable code in the insertion and removal methods, and the recursive DFS functions should be optimized to reduce unnecessary string allocations.

Comment thread mooncake-store/src/CMakeLists.txt Outdated
Comment thread mooncake-store/src/CMakeLists.txt Outdated
Comment thread mooncake-store/tests/CMakeLists.txt Outdated
Comment thread mooncake-store/src/CMakeLists.txt Outdated
Comment on lines +188 to +196
void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node,
const std::string& accumulated,
std::vector<std::string>& out) const {
if (node->is_terminal()) {
out.push_back(accumulated);
}
for (const auto& [ch, child] : node->children) {
dfs_collect_keys(child, accumulated + child->edge_label, out);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In dfs_collect_keys, the string concatenation accumulated + child->edge_label creates a new std::string object for every node in the trie. For large result sets, this leads to excessive allocations. Consider passing a std::string& buffer and using append/resize to manage the path efficiently during DFS.

References
  1. Avoid unnecessary string allocations in performance-critical paths or recursive functions. (link)

Comment thread mooncake-store/src/prefix_index.cpp Outdated
Comment thread mooncake-store/src/prefix_index.cpp Outdated
Comment thread mooncake-store/tests/CMakeLists.txt Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new compressed Patricia-style radix tree (PrefixIndex) to Mooncake Store to support efficient prefix-based key enumeration and prefix-scoped aggregations (count/bytes), along with comprehensive unit tests.

Changes:

  • Introduces PrefixIndex (compressed radix trie) with insert/remove/clear, prefix listing, continuation listing, and subtree count/bytes aggregation.
  • Adds extensive unit tests covering split/merge edge cases, upsert semantics, cleanup after deletes, and concurrent access.
  • Wires the new implementation and tests into the Mooncake Store build (library + test target).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mooncake-store/include/prefix_index.h Public API + data structure definitions and semantics for the prefix index
mooncake-store/src/prefix_index.cpp Implementation of the compressed radix tree with locking and maintained subtree stats
mooncake-store/src/CMakeLists.txt Adds prefix_index.cpp to the store library build
mooncake-store/tests/prefix_index_test.cpp New unit test suite for correctness, invariants, and concurrency
mooncake-store/tests/CMakeLists.txt Registers the new prefix_index_test target

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +92 to +93
* list_prefix_continuations, count_by_prefix, bytes_by_prefix) take a
* shared lock.
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread-safety comment lists the read operations that take a shared lock, but it omits list_entries_by_prefix() (and size()) even though they are also read-only and use the shared lock. Consider updating the comment to include them for accuracy.

Suggested change
* list_prefix_continuations, count_by_prefix, bytes_by_prefix) take a
* shared lock.
* list_entries_by_prefix, list_prefix_continuations, count_by_prefix,
* bytes_by_prefix, size) take a shared lock.

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +209
void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node,
const std::string& accumulated,
std::vector<std::string>& out) const {
if (node->is_terminal()) {
out.push_back(accumulated);
}
for (const auto& [ch, child] : node->children) {
dfs_collect_keys(child, accumulated + child->edge_label, out);
}
}

void PrefixIndex::dfs_collect_entries(
const std::shared_ptr<RadixNode>& node, const std::string& accumulated,
std::vector<std::pair<std::string, TrieLeafData>>& out) const {
if (node->is_terminal()) {
out.emplace_back(accumulated, node->leaf_data.value());
}
for (const auto& [ch, child] : node->children) {
dfs_collect_entries(child, accumulated + child->edge_label, out);
}
}

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dfs_collect_keys() builds accumulated + child->edge_label at every recursion, which creates a new std::string (allocation/copy) per edge visited. For large subtrees this can be a noticeable overhead during prefix enumeration; consider using a single mutable buffer (append before recursion, resize/pop after) to avoid the per-node concatenation allocations.

Suggested change
void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node,
const std::string& accumulated,
std::vector<std::string>& out) const {
if (node->is_terminal()) {
out.push_back(accumulated);
}
for (const auto& [ch, child] : node->children) {
dfs_collect_keys(child, accumulated + child->edge_label, out);
}
}
void PrefixIndex::dfs_collect_entries(
const std::shared_ptr<RadixNode>& node, const std::string& accumulated,
std::vector<std::pair<std::string, TrieLeafData>>& out) const {
if (node->is_terminal()) {
out.emplace_back(accumulated, node->leaf_data.value());
}
for (const auto& [ch, child] : node->children) {
dfs_collect_entries(child, accumulated + child->edge_label, out);
}
}
namespace {
void dfs_collect_keys_impl(const std::shared_ptr<RadixNode>& node,
std::string& accumulated,
std::vector<std::string>& out) {
if (node->is_terminal()) {
out.push_back(accumulated);
}
for (const auto& [ch, child] : node->children) {
const size_t original_size = accumulated.size();
accumulated += child->edge_label;
dfs_collect_keys_impl(child, accumulated, out);
accumulated.resize(original_size);
}
}
void dfs_collect_entries_impl(
const std::shared_ptr<RadixNode>& node, std::string& accumulated,
std::vector<std::pair<std::string, TrieLeafData>>& out) {
if (node->is_terminal()) {
out.emplace_back(accumulated, node->leaf_data.value());
}
for (const auto& [ch, child] : node->children) {
const size_t original_size = accumulated.size();
accumulated += child->edge_label;
dfs_collect_entries_impl(child, accumulated, out);
accumulated.resize(original_size);
}
}
} // namespace
void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node,
const std::string& accumulated,
std::vector<std::string>& out) const {
std::string buffer = accumulated;
dfs_collect_keys_impl(node, buffer, out);
}
void PrefixIndex::dfs_collect_entries(
const std::shared_ptr<RadixNode>& node, const std::string& accumulated,
std::vector<std::pair<std::string, TrieLeafData>>& out) const {
std::string buffer = accumulated;
dfs_collect_entries_impl(node, buffer, out);
}

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +209
void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node,
const std::string& accumulated,
std::vector<std::string>& out) const {
if (node->is_terminal()) {
out.push_back(accumulated);
}
for (const auto& [ch, child] : node->children) {
dfs_collect_keys(child, accumulated + child->edge_label, out);
}
}

void PrefixIndex::dfs_collect_entries(
const std::shared_ptr<RadixNode>& node, const std::string& accumulated,
std::vector<std::pair<std::string, TrieLeafData>>& out) const {
if (node->is_terminal()) {
out.emplace_back(accumulated, node->leaf_data.value());
}
for (const auto& [ch, child] : node->children) {
dfs_collect_entries(child, accumulated + child->edge_label, out);
}
}

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dfs_collect_entries() also uses accumulated + child->edge_label on each recursion, which allocates/copies a new string per visited edge. Consider switching to an append/backtrack approach (shared mutable path string) so listing large prefixes doesn't incur O(nodes) intermediate string allocations beyond the final returned keys.

Suggested change
void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node,
const std::string& accumulated,
std::vector<std::string>& out) const {
if (node->is_terminal()) {
out.push_back(accumulated);
}
for (const auto& [ch, child] : node->children) {
dfs_collect_keys(child, accumulated + child->edge_label, out);
}
}
void PrefixIndex::dfs_collect_entries(
const std::shared_ptr<RadixNode>& node, const std::string& accumulated,
std::vector<std::pair<std::string, TrieLeafData>>& out) const {
if (node->is_terminal()) {
out.emplace_back(accumulated, node->leaf_data.value());
}
for (const auto& [ch, child] : node->children) {
dfs_collect_entries(child, accumulated + child->edge_label, out);
}
}
namespace {
void dfs_collect_keys_impl(
const std::shared_ptr<PrefixIndex::RadixNode>& node, std::string& path,
std::vector<std::string>& out) {
if (node->is_terminal()) {
out.push_back(path);
}
for (const auto& [ch, child] : node->children) {
const size_t original_size = path.size();
path += child->edge_label;
dfs_collect_keys_impl(child, path, out);
path.resize(original_size);
}
}
void dfs_collect_entries_impl(
const std::shared_ptr<PrefixIndex::RadixNode>& node, std::string& path,
std::vector<std::pair<std::string, TrieLeafData>>& out) {
if (node->is_terminal()) {
out.emplace_back(path, node->leaf_data.value());
}
for (const auto& [ch, child] : node->children) {
const size_t original_size = path.size();
path += child->edge_label;
dfs_collect_entries_impl(child, path, out);
path.resize(original_size);
}
}
} // namespace
void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node,
const std::string& accumulated,
std::vector<std::string>& out) const {
std::string path = accumulated;
dfs_collect_keys_impl(node, path, out);
}
void PrefixIndex::dfs_collect_entries(
const std::shared_ptr<RadixNode>& node, const std::string& accumulated,
std::vector<std::pair<std::string, TrieLeafData>>& out) const {
std::string path = accumulated;
dfs_collect_entries_impl(node, path, out);
}

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +194
* returned string `s` is a continuation: `prefix + s` is a prefix of
* at least one stored key (or equals a stored key when `s` is the
* sole terminal continuation). Results are sorted lexicographically.
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list_prefix_continuations() doc says a continuation s may be such that prefix + s “equals a stored key when s is the sole terminal continuation”, which would imply returning an empty-string continuation for an exact terminal-leaf match. The current implementation/tests never return an empty continuation (exact terminal leaf returns an empty vector), so the comment should be updated to match the actual API semantics (continuations are non-empty suffixes).

Suggested change
* returned string `s` is a continuation: `prefix + s` is a prefix of
* at least one stored key (or equals a stored key when `s` is the
* sole terminal continuation). Results are sorted lexicographically.
* returned string `s` is a non-empty continuation: `prefix + s` is
* either a stored key or a prefix of at least one stored key.
* Exact terminal matches do not produce an empty-string continuation;
* they produce no result unless a longer extension also exists.
* Results are sorted lexicographically.

Copilot uses AI. Check for mistakes.
Comment on lines +430 to +438
TEST(PrefixIndexTest, CountByPrefixIsO1) {
PrefixIndex idx;
for (int i = 0; i < 1000; ++i) {
idx.insert("prefix/" + std::to_string(i), 10, kClient1);
}

EXPECT_EQ(idx.count_by_prefix("prefix/"), 1000);
EXPECT_EQ(idx.count_by_prefix(""), 1000);
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name CountByPrefixIsO1 is misleading: PrefixIndex::count_by_prefix() still does an O(prefix_length) traversal before doing an O(1) counter read. Consider renaming the test (and/or its comment) to avoid encoding an incorrect complexity claim.

Copilot uses AI. Check for mistakes.
@stmatengss
Copy link
Copy Markdown
Collaborator

Great work! Thanks @maheshrbapatu, will review it ASAP. CC. @ykwd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants