Skip to content

Conversation

@sanjeet006py
Copy link
Contributor

JIRA: HBASE-29233

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@virajjasani virajjasani self-requested a review April 6, 2025 23:45
@Apache9
Copy link
Contributor

Apache9 commented Apr 22, 2025

Ping @virajjasani

@virajjasani
Copy link
Contributor

Planning to take a look in a couple of days. @Apache9 if you are interested, please feel free to review, we are planning to use this internally soon.

@virajjasani
Copy link
Contributor

@rmdmattingly @ndimiduk since you have very recently reviewed HBASE-29090 already, could you please also take a look at this PR also? This too should be quite helpful.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments, looks good with initial review

Comment on lines 178 to 181
this.encodedRegionName = null;
}
if (this.serverName != null && !Objects.equals(this.serverName, other.getServerName())) {
this.serverName = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, how does setting null for encodedRegionName and serverName help? Are we using them to log something or return them with ScanMetrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodedRegionName and serverName are not part of ScanMetrics. When combining ScanMetrics if we observe that the ScanMetrics object in which all other ScanMetrics are getting combined has different serverName and encodedRegionName then we set these two fields to null as now we can't deterministically pick any one value. Plus, the assumption is someone getting combined ScanMetrics is not interested in region level details as we have separate getter for getting list of region level ScanMetrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is not immediately obvious. Please add a comment explaining your thought process.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some nits, but overall the feature looks like it's coming together.

One high-level question though: why are you adding this as a bolt-on" instead of extending the existing ScanMetrics object with the per-region collection?

return scanMetrics;
if (scanMetricsByRegion != null) {
if (scanMetricsByRegion.isEmpty()) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imho, it's easier to read code that exits early, as opposed to deeply indented if/else blocks. Just my style preference.

}

@Override
public List<ScanMetrics> getScanMetricsByRegion() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this method at all? Is it not sufficient to return the "overall" ScanMetics object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it's because you haven't added the per-region scan metrics to the existing object. Why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because you haven't added the per-region scan metrics to the existing object. Why not?

Actually I didn't consider that. But I like this idea of folding region level metrics into existing ScanMetrics object. Thanks

}

@Override
public List<ScanMetrics> getScanMetricsByRegion() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this method with this name returns a list and not a Map<encodedRegionName -> ScanMetrics>.

public abstract class AbstractClientScanner implements ResultScanner {
protected ScanMetrics scanMetrics;
protected ScanMetrics scanMetrics = null;
protected List<ScanMetrics> scanMetricsByRegion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-all question: why do you always add this List<ScanMetrics> scanMetricsByRegion instead of using the existing instance ScanMetrics scanMetrics as an accumulator and fold in the new metrics as they arrive into this instance? You could initialize it once and have less null-checking going on. You do this in all the classes that interact with this new feature.

@FunctionalInterface
private interface ScanWithMetrics {
Pair<List<Result>, ScanMetrics> scan(Scan scan) throws Exception;
Pair<List<Result>, Pair<ScanMetrics, List<ScanMetrics>>> scan(Scan scan) throws Exception;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Pair of Pairs? It's time to make a static POJO class or use record type (I guess the former because want to backport to branch-2 lineages).


@Test
public void testScanMetrics() throws Exception {
Pair<List<Result>, ScanMetrics> pair = method.scan(new Scan().setScanMetricsEnabled(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please bust up this method into two variants, one with and one without the per-region accounting enabled? Ideally the latter is a pure super-set of the former, but as things are currently implemented, I think that won't be the case...

Admin admin = TEST_UTIL.getAdmin();
Assert.assertEquals(2, admin.getRegions(tableName).size());

// Test scan metrics disabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: each of these little blocks could be its own test method, and then this test would be a lot nicer to reason about when there are failures.

scan.setEnableScanMetricsByRegion(true);

Configuration copyConf = new Configuration(conf);
// Test without providing scan metric at prior and scan metrics by region enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: each of these little blocks could be its own test method, and then this test would be a lot nicer to reason about when there are failures.


Configuration conf = UTIL.getConfiguration();

// Scan table snapshot and get combined scan metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: each of these little blocks could be its own test method, and then this test would be a lot nicer to reason about when there are failures.

@ndimiduk
Copy link
Member

Oh, and please run spotless:apply before your next push.

@sanjeet006py
Copy link
Contributor Author

One high-level question though: why are you adding this as a bolt-on" instead of extending the existing ScanMetrics object with the per-region collection?

Actually I didn't consider extending existing ScanMetrics. And, I agree by extending existing ScanMetrics many of above comments will also get resolved along with making implementation more cleaner. Will rework on this. Thanks a lot.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@sanjeet006py
Copy link
Contributor Author

What if, for example, instead of altering the existing fields, you collect the per-region values in their own accumulators, and update the existing scan-level field values on each transition. That way the existing references stay as final and existing access patterns remain valid.

@ndimiduk I have modified the implementation to incorporate above suggestion. I have implemented a slight variation of above approach: I now collect per-region values in their own accumulators but while updating per-region values I also update existing scan-level field values. This way existing access patterns remain valid.
I have also addressed all the rest of open comments.
Please take a look. Thanks

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@sanjeet006py sanjeet006py requested a review from ndimiduk June 6, 2025 14:48
@sanjeet006py
Copy link
Contributor Author

@ndimiduk can you please take a look. Thanks

@sanjeet006py
Copy link
Contributor Author

sanjeet006py commented Jun 16, 2025

@ndimiduk any update on this PR?

@sanjeet006py
Copy link
Contributor Author

@ndimiduk can you please review the changes. Can we please speed up the review cycles? Thanks
CC: @virajjasani

@ndimiduk
Copy link
Member

Hi @sanjeet006py . I appreciate that you're frustrated by how this PR is moving slowly. Unfortunately I don't have a lot of time recently for upstream work. Mentioning me over and over doesn't change that fact. There are over 100 other committers on the project, any of them can pick up reviews and merge any PR. No one committer is a bottle-neck, by design. Your best strategy for attracting the attention of other committers is by sending a note to dev@ mailing list or perhaps by starting a thread in the slack channel.

I will make an effort to spend more time with this patch later in the week, if it hasn't been merged by then.

@virajjasani
Copy link
Contributor

Let me spend sometime this week. Good to see the compatibility getting restored, this will allow us to push the changes to active branches.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, looks good overall with the new changes

Comment on lines 98 to 101
/**
* constructor
*/
public ScanMetrics() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore the constructor

Comment on lines +31 to +35
/**
* Creates a new counter with the specified name and stores it in the counters map.
* @return {@link AtomicLong} instance for the counter with counterName
*/
static AtomicLong createCounter(Map<String, AtomicLong> counters, String counterName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of mix of comments and Javadoc, let's convert all method comments into Javadocs

* {@link ServerSideScanMetrics#collectMetricsByRegion(boolean)}.
*/
@InterfaceAudience.Public
public class ScanMetricsRegionInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are not expecting much changes, until we have gained more confidence after prod use, let's also add @InterfaceStability.Evolving

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 46s Maven dependency ordering for branch
+1 💚 mvninstall 3m 36s master passed
+1 💚 compile 8m 23s master passed
+1 💚 checkstyle 1m 12s master passed
+1 💚 spotbugs 9m 44s master passed
+1 💚 spotless 0m 52s branch has no errors when running spotless:check.
-0 ⚠️ patch 1m 10s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 6s the patch passed
+1 💚 compile 8m 24s the patch passed
-0 ⚠️ javac 8m 24s /results-compile-javac-root.txt root generated 2 new + 1738 unchanged - 2 fixed = 1740 total (was 1740)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 17s /buildtool-patch-checkstyle-root.txt The patch fails to run checkstyle in root
+1 💚 spotbugs 10m 0s the patch passed
+1 💚 hadoopcheck 12m 3s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 29s The patch does not generate ASF License warnings.
68m 34s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6868/14/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6868
JIRA Issue HBASE-29233
Optional Tests dupname asflicense codespell detsecrets spotless javac spotbugs checkstyle compile hadoopcheck hbaseanti
uname Linux bee611b589e4 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 286d7c9
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 192 (vs. ulimit of 30000)
modules C: hbase-client hbase-server . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6868/14/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 39s Maven dependency ordering for branch
+1 💚 mvninstall 3m 41s master passed
+1 💚 compile 2m 10s master passed
+1 💚 javadoc 2m 40s master passed
+1 💚 shadedjars 6m 5s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ patch 6m 27s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 2m 10s the patch passed
+1 💚 javac 2m 10s the patch passed
+1 💚 javadoc 2m 39s the patch passed
+1 💚 shadedjars 6m 1s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 286m 15s root in the patch passed.
324m 6s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6868/14/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6868
JIRA Issue HBASE-29233
Optional Tests javac javadoc unit compile shadedjars
uname Linux 5841e8ee50fc 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 286d7c9
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6868/14/testReport/
Max. process+thread count 8834 (vs. ulimit of 30000)
modules C: hbase-client hbase-server . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6868/14/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@sanjeet006py sanjeet006py requested a review from virajjasani June 26, 2025 12:12
Copy link
Contributor

@haridsv haridsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have ensured backwards compatibility for external client code that used the public final counter variables by introducing one extra running copy that is specific to current region. While this is modelled exactly like what we discussed offline, one small caveat is that internal HBase code should strictly not reference the public fields, instead they must always go through the setCounter() and addToCounter() methods for this to work correctly. While ensured all the existing code follows this pattern, if new HBase code added doesn't follow it and reviewer doesn't catch it, it will end up with broken metric. In theory, one of the asserts should catch it, but it will be missed if the new metric remains as 0.

What if we track the very first call to moveToNextRegion and do an assert to make sure counters (or aggregateScanMetricsData per my other suggestion) and currentScanMetricsData match? If the correct pattern is not followed, one or more metric tests (especially the one created for the new metric) will definitely fail. As we disable assert's in production code, it won't add any extra overhead, except for an additional boolean.

private final String encodedRegionName;
private final ServerName serverName;

ScanMetricsRegionInfo(String encodedRegionName, ServerName serverName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why non-public constructor in a public class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional as I don't want users to instantiate an instance of this class by themselves from outside HBase code. The public facing part of this class is the getters but not constructor.

* @return A Map of String -> Long for metrics
*/
static Map<String, Long> collectMetrics(Map<String, AtomicLong> counters, boolean reset) {
Map<String, Long> metricsSnapshot = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Map<String, Long> metricsSnapshot = new HashMap<>();
Map<String, Long> metricsSnapshot = new HashMap<>(counters.size());

currentRegionScanMetricsData.createCounter(COUNT_OF_ROWS_SCANNED_KEY_METRIC_NAME);
currentRegionScanMetricsData.createCounter(COUNT_OF_ROWS_FILTERED_KEY_METRIC_NAME);
currentRegionScanMetricsData.createCounter(BLOCK_BYTES_SCANNED_KEY_METRIC_NAME);
currentRegionScanMetricsData.createCounter(FS_READ_TIME_METRIC_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplication of initialization is bothering! How about if you change the type of counters to RegionScanMetricsData (I would also rename counters to aggregateScanMetricsData to make the name self explanatory and show the contrast with that of currentRegionScanMetricsData) and then add a new method, say createNew in RegionScanMetricsData that simply clones itself but with 0 values? Some additional notes:

  • You would mark moveToNextRegion as final as you no longer need the overridden method.
  • Merge initScanMetricsRegionInfo into an overloaded moveToNextRegion. The default method will call then new overloaded method with RegionScanMetricsData.EMPTY_SCAN_METRICS_REGION_INFO (it needs to be made public).
  • Also merge RegionScanMetricsData.initScanMetricsRegionInfo into the proposed createNew method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initScanMetricsRegionInfo and moveToNextRegion can't be merged as its deliberately these two methods are different. Once we start scanning a region only after certain time we know the region information so, we need to two calls: one to track we moved to next region and next fill in region information when we have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest of suggestions look good. Thanks

Copy link
Contributor

@haridsv haridsv Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 patterns right now,

  1. A solo invocation of moveToNextRegion()
  2. An invocation of moveToNextRegion()followed immediately by initScanMetricsRegionInfo()

What I suggested would look like this:

  public void moveToNextRegion() {
    moveToNextRegion(ScanMetricsRegionInfo.EMPTY_SCAN_METRICS_REGION_INFO);
  }

  public void moveToNextRegion(String encodedRegionName, ServerName serverName) {
    moveToNextRegion(new ScanMetricsRegionInfo(encodedRegionName, serverName));
  }

  private void moveToNextRegion(ScanMetricsRegionInfo regionInfo) {
    currentRegionScanMetricsData = RegionScanMetricsData.createNew(regionInfo);
  }

The first pattern won't change, only the second one will call the merged invocation. Does that make sense?

* Hash to hold the String -> Atomic Long mappings for each metric
*/
private final Map<String, AtomicLong> counters = new HashMap<>();
private final List<RegionScanMetricsData> regionScanMetricsData = new ArrayList<>(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name is a bit vague and confusing, especially relative to currentRegionScanMetricsData. Even a quick rename to scanMetricsDataList would improve the readability, but if you agree with me, may be you can come up with a better name.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@virajjasani
Copy link
Contributor

Thanks @sanjeet006py, given that branch-2 PR might take some time due to client interface changes, planning to merge this upto branch-3 for now.

@virajjasani virajjasani merged commit 7988cdb into apache:master Jun 26, 2025
1 check passed
virajjasani pushed a commit that referenced this pull request Jun 26, 2025
sanjeet006py added a commit to sanjeet006py/hbase that referenced this pull request Jun 29, 2025
@ndimiduk
Copy link
Member

Thanks for the contribution @sanjeet006py and to @virajjasani for bringing it home.

virajjasani pushed a commit that referenced this pull request Jul 2, 2025
virajjasani pushed a commit that referenced this pull request Jul 2, 2025
sanjeet006py added a commit to sanjeet006py/hbase that referenced this pull request Jul 3, 2025
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants