Adding support for prefix queries#1870
Adding support for prefix queries#1870maheshrbapatu wants to merge 5 commits intokvcache-ai:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a PrefixIndex class, implemented as a compressed radix tree, to support efficient prefix-based operations. However, the PR contains critical issues: the CMakeLists.txt files have been severely misformatted with spaces in file paths, which will break the build. Additionally, the PrefixIndex implementation contains unreachable code in the insertion and removal methods, and the recursive DFS functions should be optimized to reduce unnecessary string allocations.
| void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node, | ||
| const std::string& accumulated, | ||
| std::vector<std::string>& out) const { | ||
| if (node->is_terminal()) { | ||
| out.push_back(accumulated); | ||
| } | ||
| for (const auto& [ch, child] : node->children) { | ||
| dfs_collect_keys(child, accumulated + child->edge_label, out); | ||
| } |
There was a problem hiding this comment.
In dfs_collect_keys, the string concatenation accumulated + child->edge_label creates a new std::string object for every node in the trie. For large result sets, this leads to excessive allocations. Consider passing a std::string& buffer and using append/resize to manage the path efficiently during DFS.
References
- Avoid unnecessary string allocations in performance-critical paths or recursive functions. (link)
There was a problem hiding this comment.
Pull request overview
This PR adds a new compressed Patricia-style radix tree (PrefixIndex) to Mooncake Store to support efficient prefix-based key enumeration and prefix-scoped aggregations (count/bytes), along with comprehensive unit tests.
Changes:
- Introduces
PrefixIndex(compressed radix trie) with insert/remove/clear, prefix listing, continuation listing, and subtree count/bytes aggregation. - Adds extensive unit tests covering split/merge edge cases, upsert semantics, cleanup after deletes, and concurrent access.
- Wires the new implementation and tests into the Mooncake Store build (library + test target).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| mooncake-store/include/prefix_index.h | Public API + data structure definitions and semantics for the prefix index |
| mooncake-store/src/prefix_index.cpp | Implementation of the compressed radix tree with locking and maintained subtree stats |
| mooncake-store/src/CMakeLists.txt | Adds prefix_index.cpp to the store library build |
| mooncake-store/tests/prefix_index_test.cpp | New unit test suite for correctness, invariants, and concurrency |
| mooncake-store/tests/CMakeLists.txt | Registers the new prefix_index_test target |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * list_prefix_continuations, count_by_prefix, bytes_by_prefix) take a | ||
| * shared lock. |
There was a problem hiding this comment.
The thread-safety comment lists the read operations that take a shared lock, but it omits list_entries_by_prefix() (and size()) even though they are also read-only and use the shared lock. Consider updating the comment to include them for accuracy.
| * list_prefix_continuations, count_by_prefix, bytes_by_prefix) take a | |
| * shared lock. | |
| * list_entries_by_prefix, list_prefix_continuations, count_by_prefix, | |
| * bytes_by_prefix, size) take a shared lock. |
| void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node, | ||
| const std::string& accumulated, | ||
| std::vector<std::string>& out) const { | ||
| if (node->is_terminal()) { | ||
| out.push_back(accumulated); | ||
| } | ||
| for (const auto& [ch, child] : node->children) { | ||
| dfs_collect_keys(child, accumulated + child->edge_label, out); | ||
| } | ||
| } | ||
|
|
||
| void PrefixIndex::dfs_collect_entries( | ||
| const std::shared_ptr<RadixNode>& node, const std::string& accumulated, | ||
| std::vector<std::pair<std::string, TrieLeafData>>& out) const { | ||
| if (node->is_terminal()) { | ||
| out.emplace_back(accumulated, node->leaf_data.value()); | ||
| } | ||
| for (const auto& [ch, child] : node->children) { | ||
| dfs_collect_entries(child, accumulated + child->edge_label, out); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
dfs_collect_keys() builds accumulated + child->edge_label at every recursion, which creates a new std::string (allocation/copy) per edge visited. For large subtrees this can be a noticeable overhead during prefix enumeration; consider using a single mutable buffer (append before recursion, resize/pop after) to avoid the per-node concatenation allocations.
| void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node, | |
| const std::string& accumulated, | |
| std::vector<std::string>& out) const { | |
| if (node->is_terminal()) { | |
| out.push_back(accumulated); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| dfs_collect_keys(child, accumulated + child->edge_label, out); | |
| } | |
| } | |
| void PrefixIndex::dfs_collect_entries( | |
| const std::shared_ptr<RadixNode>& node, const std::string& accumulated, | |
| std::vector<std::pair<std::string, TrieLeafData>>& out) const { | |
| if (node->is_terminal()) { | |
| out.emplace_back(accumulated, node->leaf_data.value()); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| dfs_collect_entries(child, accumulated + child->edge_label, out); | |
| } | |
| } | |
| namespace { | |
| void dfs_collect_keys_impl(const std::shared_ptr<RadixNode>& node, | |
| std::string& accumulated, | |
| std::vector<std::string>& out) { | |
| if (node->is_terminal()) { | |
| out.push_back(accumulated); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| const size_t original_size = accumulated.size(); | |
| accumulated += child->edge_label; | |
| dfs_collect_keys_impl(child, accumulated, out); | |
| accumulated.resize(original_size); | |
| } | |
| } | |
| void dfs_collect_entries_impl( | |
| const std::shared_ptr<RadixNode>& node, std::string& accumulated, | |
| std::vector<std::pair<std::string, TrieLeafData>>& out) { | |
| if (node->is_terminal()) { | |
| out.emplace_back(accumulated, node->leaf_data.value()); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| const size_t original_size = accumulated.size(); | |
| accumulated += child->edge_label; | |
| dfs_collect_entries_impl(child, accumulated, out); | |
| accumulated.resize(original_size); | |
| } | |
| } | |
| } // namespace | |
| void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node, | |
| const std::string& accumulated, | |
| std::vector<std::string>& out) const { | |
| std::string buffer = accumulated; | |
| dfs_collect_keys_impl(node, buffer, out); | |
| } | |
| void PrefixIndex::dfs_collect_entries( | |
| const std::shared_ptr<RadixNode>& node, const std::string& accumulated, | |
| std::vector<std::pair<std::string, TrieLeafData>>& out) const { | |
| std::string buffer = accumulated; | |
| dfs_collect_entries_impl(node, buffer, out); | |
| } |
| void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node, | ||
| const std::string& accumulated, | ||
| std::vector<std::string>& out) const { | ||
| if (node->is_terminal()) { | ||
| out.push_back(accumulated); | ||
| } | ||
| for (const auto& [ch, child] : node->children) { | ||
| dfs_collect_keys(child, accumulated + child->edge_label, out); | ||
| } | ||
| } | ||
|
|
||
| void PrefixIndex::dfs_collect_entries( | ||
| const std::shared_ptr<RadixNode>& node, const std::string& accumulated, | ||
| std::vector<std::pair<std::string, TrieLeafData>>& out) const { | ||
| if (node->is_terminal()) { | ||
| out.emplace_back(accumulated, node->leaf_data.value()); | ||
| } | ||
| for (const auto& [ch, child] : node->children) { | ||
| dfs_collect_entries(child, accumulated + child->edge_label, out); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
dfs_collect_entries() also uses accumulated + child->edge_label on each recursion, which allocates/copies a new string per visited edge. Consider switching to an append/backtrack approach (shared mutable path string) so listing large prefixes doesn't incur O(nodes) intermediate string allocations beyond the final returned keys.
| void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node, | |
| const std::string& accumulated, | |
| std::vector<std::string>& out) const { | |
| if (node->is_terminal()) { | |
| out.push_back(accumulated); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| dfs_collect_keys(child, accumulated + child->edge_label, out); | |
| } | |
| } | |
| void PrefixIndex::dfs_collect_entries( | |
| const std::shared_ptr<RadixNode>& node, const std::string& accumulated, | |
| std::vector<std::pair<std::string, TrieLeafData>>& out) const { | |
| if (node->is_terminal()) { | |
| out.emplace_back(accumulated, node->leaf_data.value()); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| dfs_collect_entries(child, accumulated + child->edge_label, out); | |
| } | |
| } | |
| namespace { | |
| void dfs_collect_keys_impl( | |
| const std::shared_ptr<PrefixIndex::RadixNode>& node, std::string& path, | |
| std::vector<std::string>& out) { | |
| if (node->is_terminal()) { | |
| out.push_back(path); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| const size_t original_size = path.size(); | |
| path += child->edge_label; | |
| dfs_collect_keys_impl(child, path, out); | |
| path.resize(original_size); | |
| } | |
| } | |
| void dfs_collect_entries_impl( | |
| const std::shared_ptr<PrefixIndex::RadixNode>& node, std::string& path, | |
| std::vector<std::pair<std::string, TrieLeafData>>& out) { | |
| if (node->is_terminal()) { | |
| out.emplace_back(path, node->leaf_data.value()); | |
| } | |
| for (const auto& [ch, child] : node->children) { | |
| const size_t original_size = path.size(); | |
| path += child->edge_label; | |
| dfs_collect_entries_impl(child, path, out); | |
| path.resize(original_size); | |
| } | |
| } | |
| } // namespace | |
| void PrefixIndex::dfs_collect_keys(const std::shared_ptr<RadixNode>& node, | |
| const std::string& accumulated, | |
| std::vector<std::string>& out) const { | |
| std::string path = accumulated; | |
| dfs_collect_keys_impl(node, path, out); | |
| } | |
| void PrefixIndex::dfs_collect_entries( | |
| const std::shared_ptr<RadixNode>& node, const std::string& accumulated, | |
| std::vector<std::pair<std::string, TrieLeafData>>& out) const { | |
| std::string path = accumulated; | |
| dfs_collect_entries_impl(node, path, out); | |
| } |
| * returned string `s` is a continuation: `prefix + s` is a prefix of | ||
| * at least one stored key (or equals a stored key when `s` is the | ||
| * sole terminal continuation). Results are sorted lexicographically. |
There was a problem hiding this comment.
The list_prefix_continuations() doc says a continuation s may be such that prefix + s “equals a stored key when s is the sole terminal continuation”, which would imply returning an empty-string continuation for an exact terminal-leaf match. The current implementation/tests never return an empty continuation (exact terminal leaf returns an empty vector), so the comment should be updated to match the actual API semantics (continuations are non-empty suffixes).
| * returned string `s` is a continuation: `prefix + s` is a prefix of | |
| * at least one stored key (or equals a stored key when `s` is the | |
| * sole terminal continuation). Results are sorted lexicographically. | |
| * returned string `s` is a non-empty continuation: `prefix + s` is | |
| * either a stored key or a prefix of at least one stored key. | |
| * Exact terminal matches do not produce an empty-string continuation; | |
| * they produce no result unless a longer extension also exists. | |
| * Results are sorted lexicographically. |
| TEST(PrefixIndexTest, CountByPrefixIsO1) { | ||
| PrefixIndex idx; | ||
| for (int i = 0; i < 1000; ++i) { | ||
| idx.insert("prefix/" + std::to_string(i), 10, kClient1); | ||
| } | ||
|
|
||
| EXPECT_EQ(idx.count_by_prefix("prefix/"), 1000); | ||
| EXPECT_EQ(idx.count_by_prefix(""), 1000); | ||
| } |
There was a problem hiding this comment.
The test name CountByPrefixIsO1 is misleading: PrefixIndex::count_by_prefix() still does an O(prefix_length) traversal before doing an O(1) counter read. Consider renaming the test (and/or its comment) to avoid encoding an incorrect complexity claim.
|
Great work! Thanks @maheshrbapatu, will review it ASAP. CC. @ykwd |
Description
Introduce a compressed Patricia-style radix tree over string keys to support efficient prefix-based enumeration and aggregation.
The trie is implemented as an independent data structure and is not yet integrated with MasterService or the existing hash-sharded metadata index.
Supports:
Uses coarse-grained locking to provide thread-safe concurrent reads and writes.
Stores lightweight per-terminal-node summary data (TrieLeafData) to keep scope decoupled from full metadata lifecycle.
Includes unit tests covering:
Module
mooncake-transfer-engine)mooncake-store)mooncake-ep)mooncake-integration)mooncake-p2p-store)mooncake-wheel)mooncake-pg)mooncake-rl)Type of Change
How Has This Been Tested?
Added a bunch of new UTs.
Checklist
./scripts/code_format.shbefore submitting.Ref #1732