-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29233: Capture scan metrics at region level #6868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-29233: Capture scan metrics at region level #6868
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Ping @virajjasani |
|
Planning to take a look in a couple of days. @Apache9 if you are interested, please feel free to review, we are planning to use this internally soon. |
|
@rmdmattingly @ndimiduk since you have very recently reviewed HBASE-29090 already, could you please also take a look at this PR also? This too should be quite helpful. |
virajjasani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few comments, looks good with initial review
hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractClientScanner.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/metrics/ServerSideScanMetrics.java
Show resolved
Hide resolved
| this.encodedRegionName = null; | ||
| } | ||
| if (this.serverName != null && !Objects.equals(this.serverName, other.getServerName())) { | ||
| this.serverName = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, how does setting null for encodedRegionName and serverName help? Are we using them to log something or return them with ScanMetrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encodedRegionName and serverName are not part of ScanMetrics. When combining ScanMetrics if we observe that the ScanMetrics object in which all other ScanMetrics are getting combined has different serverName and encodedRegionName then we set these two fields to null as now we can't deterministically pick any one value. Plus, the assumption is someone getting combined ScanMetrics is not interested in region level details as we have separate getter for getting list of region level ScanMetrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is not immediately obvious. Please add a comment explaining your thought process.
ndimiduk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some nits, but overall the feature looks like it's coming together.
One high-level question though: why are you adding this as a bolt-on" instead of extending the existing ScanMetrics object with the per-region collection?
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractClientScanner.java
Outdated
Show resolved
Hide resolved
| return scanMetrics; | ||
| if (scanMetricsByRegion != null) { | ||
| if (scanMetricsByRegion.isEmpty()) { | ||
| return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: imho, it's easier to read code that exits early, as opposed to deeply indented if/else blocks. Just my style preference.
| } | ||
|
|
||
| @Override | ||
| public List<ScanMetrics> getScanMetricsByRegion() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this method at all? Is it not sufficient to return the "overall" ScanMetics object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, it's because you haven't added the per-region scan metrics to the existing object. Why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's because you haven't added the per-region scan metrics to the existing object. Why not?
Actually I didn't consider that. But I like this idea of folding region level metrics into existing ScanMetrics object. Thanks
| } | ||
|
|
||
| @Override | ||
| public List<ScanMetrics> getScanMetricsByRegion() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that this method with this name returns a list and not a Map<encodedRegionName -> ScanMetrics>.
| public abstract class AbstractClientScanner implements ResultScanner { | ||
| protected ScanMetrics scanMetrics; | ||
| protected ScanMetrics scanMetrics = null; | ||
| protected List<ScanMetrics> scanMetricsByRegion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over-all question: why do you always add this List<ScanMetrics> scanMetricsByRegion instead of using the existing instance ScanMetrics scanMetrics as an accumulator and fold in the new metrics as they arrive into this instance? You could initialize it once and have less null-checking going on. You do this in all the classes that interact with this new feature.
| @FunctionalInterface | ||
| private interface ScanWithMetrics { | ||
| Pair<List<Result>, ScanMetrics> scan(Scan scan) throws Exception; | ||
| Pair<List<Result>, Pair<ScanMetrics, List<ScanMetrics>>> scan(Scan scan) throws Exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Pair of Pairs? It's time to make a static POJO class or use record type (I guess the former because want to backport to branch-2 lineages).
|
|
||
| @Test | ||
| public void testScanMetrics() throws Exception { | ||
| Pair<List<Result>, ScanMetrics> pair = method.scan(new Scan().setScanMetricsEnabled(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please bust up this method into two variants, one with and one without the per-region accounting enabled? Ideally the latter is a pure super-set of the former, but as things are currently implemented, I think that won't be the case...
| Admin admin = TEST_UTIL.getAdmin(); | ||
| Assert.assertEquals(2, admin.getRegions(tableName).size()); | ||
|
|
||
| // Test scan metrics disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: each of these little blocks could be its own test method, and then this test would be a lot nicer to reason about when there are failures.
| scan.setEnableScanMetricsByRegion(true); | ||
|
|
||
| Configuration copyConf = new Configuration(conf); | ||
| // Test without providing scan metric at prior and scan metrics by region enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: each of these little blocks could be its own test method, and then this test would be a lot nicer to reason about when there are failures.
|
|
||
| Configuration conf = UTIL.getConfiguration(); | ||
|
|
||
| // Scan table snapshot and get combined scan metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: each of these little blocks could be its own test method, and then this test would be a lot nicer to reason about when there are failures.
|
Oh, and please run |
Actually I didn't consider extending existing ScanMetrics. And, I agree by extending existing ScanMetrics many of above comments will also get resolved along with making implementation more cleaner. Will rework on this. Thanks a lot. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@ndimiduk I have modified the implementation to incorporate above suggestion. I have implemented a slight variation of above approach: I now collect per-region values in their own accumulators but while updating per-region values I also update existing scan-level field values. This way existing access patterns remain valid. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@ndimiduk can you please take a look. Thanks |
|
@ndimiduk any update on this PR? |
|
@ndimiduk can you please review the changes. Can we please speed up the review cycles? Thanks |
|
Hi @sanjeet006py . I appreciate that you're frustrated by how this PR is moving slowly. Unfortunately I don't have a lot of time recently for upstream work. Mentioning me over and over doesn't change that fact. There are over 100 other committers on the project, any of them can pick up reviews and merge any PR. No one committer is a bottle-neck, by design. Your best strategy for attracting the attention of other committers is by sending a note to dev@ mailing list or perhaps by starting a thread in the slack channel. I will make an effort to spend more time with this patch later in the week, if it hasn't been merged by then. |
|
Let me spend sometime this week. Good to see the compatibility getting restored, this will allow us to push the changes to active branches. |
virajjasani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments, looks good overall with the new changes
| /** | ||
| * constructor | ||
| */ | ||
| public ScanMetrics() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restore the constructor
| /** | ||
| * Creates a new counter with the specified name and stores it in the counters map. | ||
| * @return {@link AtomicLong} instance for the counter with counterName | ||
| */ | ||
| static AtomicLong createCounter(Map<String, AtomicLong> counters, String counterName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of mix of comments and Javadoc, let's convert all method comments into Javadocs
| * {@link ServerSideScanMetrics#collectMetricsByRegion(boolean)}. | ||
| */ | ||
| @InterfaceAudience.Public | ||
| public class ScanMetricsRegionInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are not expecting much changes, until we have gained more confidence after prod use, let's also add @InterfaceStability.Evolving
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
haridsv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have ensured backwards compatibility for external client code that used the public final counter variables by introducing one extra running copy that is specific to current region. While this is modelled exactly like what we discussed offline, one small caveat is that internal HBase code should strictly not reference the public fields, instead they must always go through the setCounter() and addToCounter() methods for this to work correctly. While ensured all the existing code follows this pattern, if new HBase code added doesn't follow it and reviewer doesn't catch it, it will end up with broken metric. In theory, one of the asserts should catch it, but it will be missed if the new metric remains as 0.
What if we track the very first call to moveToNextRegion and do an assert to make sure counters (or aggregateScanMetricsData per my other suggestion) and currentScanMetricsData match? If the correct pattern is not followed, one or more metric tests (especially the one created for the new metric) will definitely fail. As we disable assert's in production code, it won't add any extra overhead, except for an additional boolean.
| private final String encodedRegionName; | ||
| private final ServerName serverName; | ||
|
|
||
| ScanMetricsRegionInfo(String encodedRegionName, ServerName serverName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why non-public constructor in a public class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional as I don't want users to instantiate an instance of this class by themselves from outside HBase code. The public facing part of this class is the getters but not constructor.
| * @return A Map of String -> Long for metrics | ||
| */ | ||
| static Map<String, Long> collectMetrics(Map<String, AtomicLong> counters, boolean reset) { | ||
| Map<String, Long> metricsSnapshot = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Map<String, Long> metricsSnapshot = new HashMap<>(); | |
| Map<String, Long> metricsSnapshot = new HashMap<>(counters.size()); |
| currentRegionScanMetricsData.createCounter(COUNT_OF_ROWS_SCANNED_KEY_METRIC_NAME); | ||
| currentRegionScanMetricsData.createCounter(COUNT_OF_ROWS_FILTERED_KEY_METRIC_NAME); | ||
| currentRegionScanMetricsData.createCounter(BLOCK_BYTES_SCANNED_KEY_METRIC_NAME); | ||
| currentRegionScanMetricsData.createCounter(FS_READ_TIME_METRIC_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplication of initialization is bothering! How about if you change the type of counters to RegionScanMetricsData (I would also rename counters to aggregateScanMetricsData to make the name self explanatory and show the contrast with that of currentRegionScanMetricsData) and then add a new method, say createNew in RegionScanMetricsData that simply clones itself but with 0 values? Some additional notes:
- You would mark
moveToNextRegionas final as you no longer need the overridden method. - Merge
initScanMetricsRegionInfointo an overloadedmoveToNextRegion. The default method will call then new overloaded method withRegionScanMetricsData.EMPTY_SCAN_METRICS_REGION_INFO(it needs to be made public). - Also merge
RegionScanMetricsData.initScanMetricsRegionInfointo the proposedcreateNewmethod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initScanMetricsRegionInfo and moveToNextRegion can't be merged as its deliberately these two methods are different. Once we start scanning a region only after certain time we know the region information so, we need to two calls: one to track we moved to next region and next fill in region information when we have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest of suggestions look good. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 patterns right now,
- A solo invocation of moveToNextRegion()
- An invocation of moveToNextRegion()followed immediately by initScanMetricsRegionInfo()
What I suggested would look like this:
public void moveToNextRegion() {
moveToNextRegion(ScanMetricsRegionInfo.EMPTY_SCAN_METRICS_REGION_INFO);
}
public void moveToNextRegion(String encodedRegionName, ServerName serverName) {
moveToNextRegion(new ScanMetricsRegionInfo(encodedRegionName, serverName));
}
private void moveToNextRegion(ScanMetricsRegionInfo regionInfo) {
currentRegionScanMetricsData = RegionScanMetricsData.createNew(regionInfo);
}
The first pattern won't change, only the second one will call the merged invocation. Does that make sense?
| * Hash to hold the String -> Atomic Long mappings for each metric | ||
| */ | ||
| private final Map<String, AtomicLong> counters = new HashMap<>(); | ||
| private final List<RegionScanMetricsData> regionScanMetricsData = new ArrayList<>(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name is a bit vague and confusing, especially relative to currentRegionScanMetricsData. Even a quick rename to scanMetricsDataList would improve the readability, but if you agree with me, may be you can come up with a better name.
virajjasani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Thanks @sanjeet006py, given that branch-2 PR might take some time due to client interface changes, planning to merge this upto branch-3 for now. |
Signed-off-by: Viraj Jasani <[email protected]>
Signed-off-by: Viraj Jasani <[email protected]>
|
Thanks for the contribution @sanjeet006py and to @virajjasani for bringing it home. |
Signed-off-by: Viraj Jasani <[email protected]>
Signed-off-by: Viraj Jasani <[email protected]>
…he#6868) Signed-off-by: Viraj Jasani <[email protected]>
…he#6868) Signed-off-by: Viraj Jasani <[email protected]>
JIRA: HBASE-29233