Skip to content

HBASE-29974: Persist filter hints across scan circuit breaks#7882

Open
shubham-roy wants to merge 8 commits intoapache:masterfrom
shubham-roy:HBASE-29974
Open

HBASE-29974: Persist filter hints across scan circuit breaks#7882
shubham-roy wants to merge 8 commits intoapache:masterfrom
shubham-roy:HBASE-29974

Conversation

@shubham-roy
Copy link
Copy Markdown
Contributor

@shubham-roy shubham-roy commented Mar 7, 2026

JIRA

HBASE-29974


Problem / Motivation

HBase's scan pipeline applies several structural short-circuit checks in
UserScanQueryMatcher.matchColumn — time-range gate, column-set exclusion, and
version-limit exhaustion — before Filter.filterCell is ever called. When any
of these gates fire, the pipeline returns a plain SKIP or SEEK_NEXT_COL code
and the filter is bypassed entirely.

Similarly, RegionScannerImpl iterates through every cell in a rejected row
one-by-one via nextRow() after filterRowKey returns true, even though a
filter can deterministically compute the next valid
row boundary.

The consequence is that filter implementations that could provide meaningful
seek hints are never consulted at these decision points
, forcing the scanner to
advance cell-by-cell rather than issuing a single requestSeek. For
range-oriented or sparse-column filters over large tables this results in
significant, avoidable read amplification.


Root Cause

The existing getNextCellHint / SEEK_NEXT_USING_HINT mechanism requires
filterCell to have been called first; there was no API contract or call-site
for filters to provide a forward hint when the cell was structurally excluded
before reaching filterCell. The three structural gates in matchColumn and the
filterRowKey rejection path all lacked hook points.


Solution

New public API (Filter)

Two new optional methods are added to the abstract Filter class, both with
backward-compatible null defaults in FilterBase and FilterWrapper:

Method When called Contract
getSkipHint(Cell skippedCell) Cell rejected by time-range, column-set, or version gate before filterCell Stateless — must not modify filter state
getHintForRejectedRow(Cell firstRowCell) Row rejected by filterRowKey before any cell-level iteration May use state set during filterRowKey

UserScanQueryMatcher (server-side)

  • Introduces pendingSkipHint field and resolveSkipHint() helper.
  • At each of the three structural short-circuit branches (time-range tsCmp > 0,
    time-range tsCmp < 0, column-set exclusion, version SKIP/SEEK_NEXT_COL),
    calls resolveSkipHint() and promotes the result to SEEK_NEXT_USING_HINT
    when a non-null hint is returned.
  • getNextKeyHint drains pendingSkipHint first, before delegating to
    filter.getNextCellHint, ensuring the correct hint is returned for the cell
    that triggered the structural gate.

RegionScannerImpl (server-side)

  • getHintForRejectedRow(Cell) validates and retrieves the filter hint after a
    filterRowKey rejection.
  • nextRowViaHint(ScannerContext, Cell, ExtendedCell) replaces the nextRow()
    call with a single storeHeap.requestSeek(hint), wrapped with skippingRow
    mode to keep block-size accounting consistent.

Backward Compatibility

  • No breaking changes. Both new Filter methods return null by default
    (defined on Filter directly, with matching no-ops in FilterBase and
    FilterWrapper), so all existing filter implementations continue to behave
    identically.
  • DoNotRetryIOException is thrown (consistent with existing getNextCellHint
    validation) if a filter returns a non-ExtendedCell from either new method.

Shubham Roy added 2 commits March 7, 2026 18:47
…rcuit breaks in scan pipeline, causing unnecessary cell-level iteration.
@shubham-roy shubham-roy changed the title [DRAFT] HBASE-29974: Fixing under-utilisation of filter hints due to early circuit breaks in scan pipeline, causing unnecessary cell-level iteration. HBASE-29974: Fixing under-utilisation of filter hints due to early circuit breaks in scan pipeline, causing unnecessary cell-level iteration. Mar 12, 2026
@virajjasani virajjasani self-requested a review March 25, 2026 18:23
@virajjasani
Copy link
Copy Markdown
Contributor

@shubham-roy could you rebase with latest master?

@virajjasani
Copy link
Copy Markdown
Contributor

FYI @tkhurana

@virajjasani
Copy link
Copy Markdown
Contributor

Error Type Count Module Tests
Failures 1 hbase-server org.apache.hadoop.hbase.io.hfile.bucket.TestPrefetchWithBucketCache.testPrefetchRunTriggersEvictions
Flakes 3 hbase-server org.apache.hadoop.hbase.master.balancer.TestBalancerDecision.testBalancerDecisions org.apache.hadoop.hbase.master.region.TestMasterRegionWALCleaner.test org.apache.hadoop.hbase.security.access.TestZKPermissionWatcher.testPermissionsWatcher

@virajjasani
Copy link
Copy Markdown
Contributor

spotless 0m 17s patch has 63 errors when running spotless:check, run spotless:apply to fix.

@shubham-roy
Copy link
Copy Markdown
Contributor Author

@virajjasani , I have resolved merge conflicts and have also applied spotless command.

For the test failures:
org.apache.hadoop.hbase.io.hfile.bucket.TestPrefetchWithBucketCache.testPrefetchRunTriggersEvictions

The above passes in my local. Are we sure it is not a flapper?
Screenshot 2026-04-10 at 8 59 49 PM

Also, I believe the 3 count of flakes can be ignored, right?

@shubham-roy shubham-roy changed the title HBASE-29974: Fixing under-utilisation of filter hints due to early circuit breaks in scan pipeline, causing unnecessary cell-level iteration. HBASE-29974: Persist filter hints across scan circuit breaks Apr 11, 2026
Copy link
Copy Markdown
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments, looks good overall

Comment on lines 132 to 140
}
Cell hint = filter.getNextCellHint(cell);
if (hint == null || hint instanceof ExtendedCell) {
return (ExtendedCell) hint;
} else {
Cell hint = filter.getNextCellHint(cell);
if (hint == null || hint instanceof ExtendedCell) {
return (ExtendedCell) hint;
} else {
throw new DoNotRetryIOException("Incorrect filter implementation, "
+ "the Cell returned by getNextKeyHint is not an ExtendedCell. Filter class: "
+ filter.getClass().getName());
}

throw new DoNotRetryIOException("Incorrect filter implementation, "
+ "the Cell returned by getNextKeyHint is not an ExtendedCell. Filter class: "
+ filter.getClass().getName());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part changing at all? I guess this is only structural change.

Can we remove this change just to keep it like it was before?

Comment on lines +422 to +425
// -----------------------------------------------------------------------
// Tests for HBASE-29974: Filter#getSkipHint consulted at matchColumn
// structural short-circuits (time-range, column, and version gates).
// -----------------------------------------------------------------------
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these lines

Comment on lines +147 to +148
@Test
public void testHintJumpsOverAllRejectedRowsInOneSingleSeek() throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testHintJumpsOverAllRejectedRowsInOneSingleSeek() and testNullHintFallsThroughToLegacyNextRowBehaviour() should be combined in single test. When we check for data correctness, we should check it on the same table, same data, one with hintFilter, one with noHintFilter and compare cells. Separate tests do not serve the same purpose.

* </ol>
*/
@Category({ FilterTests.class, MediumTests.class })
public class TestFilterHintForRejectedRow {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all tests in this class, it seems we are not overriding getSkipHint(). Let's create one test where we some cells are out of scan time-range and getSkipHint() is used. Then for the same data, we can use scan without getSkipHint() and it should still return same cells, for the data correctness comparison.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though we need both getHintForRejectedRow() and getSkipHint() implementations,
for phoenix cdc, i believe getSkipHint() will more likely be common hint than getHintForRejectedRow(), is that correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getHintForRejectedRow() is also equally applicable for the phoenix cdc based backup solution that we are developing. We identify row keys that were mutated for the backup time range via CDC and then get information for the same row keys. While skipping row keys that were not mutated getHintForRejectedRow() comes into play.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants