HBASE-30115 Introduce approximate progress estimation for TableRecordReader based on row key position#8134
HBASE-30115 Introduce approximate progress estimation for TableRecordReader based on row key position#8134jinhyukify wants to merge 3 commits intoapache:masterfrom
Conversation
…Reader based on row key position
| if (row == null) { | ||
| return 0; | ||
| } | ||
| double d = 0; |
There was a problem hiding this comment.
double avoids unsigned arithmetic issues. Java has no unsigned long, so interpreting row key bytes as a raw long would be cumbersome. With double, all values are naturally non-negative and standard arithmetic just works.
| LOG.warn("Failed to probe first row for progress estimation", e); | ||
| return null; |
There was a problem hiding this comment.
If there are any issues with the scan, this will simply report 0 progress.
|
Thanks, here are my initial thoughts. Pluggable
|
|
@junegunn Thank you for your feedback. Pluggable
|
|
I've just fixed test failures in |
There was a problem hiding this comment.
Pull request overview
This PR (HBASE-30115) adds an approximate progress estimation mechanism for TableRecordReader by mapping the last-read row key into a normalized fraction of the scan’s start/stop key range. This improves MapReduce task progress reporting without requiring tuple counting.
Changes:
- Added a pluggable
RowKeyProgressinterface with default (UniformRowKeyProgress) and hex-specific (HexStringRowKeyProgress) implementations. - Updated
TableRecordReaderImpl#getProgress()to return an estimated fraction based on the last successfully read row key (with optional probing for empty start/stop bounds). - Added unit tests for both progress implementations.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java | Initializes a RowKeyProgress estimator and uses it to report approximate scan progress; includes probing logic for empty bounds. |
| hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowKeyProgress.java | Introduces the progress-estimation SPI and configuration key. |
| hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/UniformRowKeyProgress.java | Default progress estimator treating row keys as big-endian unsigned byte sequences. |
| hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HexStringRowKeyProgress.java | Progress estimator for ASCII hex-encoded row keys (e.g., hash prefixes). |
| hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestUniformRowKeyProgress.java | Unit tests for UniformRowKeyProgress. |
| hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHexStringRowKeyProgress.java | Unit tests for HexStringRowKeyProgress. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private byte[] probeLastRow() { | ||
| try { | ||
| Scan probeScan = new Scan(scan); | ||
| probeScan.setReversed(true); | ||
| probeScan.setOneRowLimit(); | ||
| try (ResultScanner probeScanner = htable.getScanner(probeScan)) { | ||
| Result result = probeScanner.next(); | ||
| return result != null ? result.getRow() : null; | ||
| } |
| } | ||
| if (b >= 'a' && b <= 'f') { | ||
| return 10 + (b - 'a'); | ||
| } |
| * Hex characters past the start/stop divergence point to include for resolution. 4 hex chars = 65 | ||
| * 536 buckets, finer than any progress bar can display. |
Jira https://issues.apache.org/jira/browse/HBASE-30115