-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29090: Add server-side load metrics to client results #6623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
krconv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be very useful! Just some nitpicks, feel free to ignore
| } | ||
| optional ReadType readType = 23 [default = DEFAULT]; | ||
| optional bool need_cursor_result = 24 [default = false]; | ||
| optional bool resultMetricsEnabled = 25 [default = false]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; enable_result_metrics matches the existing fields better (and same note for other protobuf changes)
| optional bool resultMetricsEnabled = 25 [default = false]; | |
| optional bool enable_result_metrics = 25 [default = false]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. this file has both camel & snake case littered everywhere I should probably avoid adding additional inconsistencies
| * Statistics about the Result's server-side metrics | ||
| */ | ||
| message ResultMetrics { | ||
| required uint64 blockBytesScanned = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the standard is for HBase, but I know that optional values with a default are more future proof than required fields, and I think that would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't familiar with the move away from required fields, I'm happy to make this optional.
| builder.setStale(result.isStale()); | ||
| builder.setPartial(result.mayHaveMoreCellsInRow()); | ||
|
|
||
| if (result.getMetrics() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the result metrics will get lost for exists calls; thoughts on handling that more explicitly? Maybe throwing an error if someone tries to enable metrics on an Get with checkExistenceOnly == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing an error might be a bit aggressive, though I'd be curious to know what others think. I'm not sure if we're happy throwing an unchecked exception from setCheckExistenceOnly and if we wanted to throw an IOException, we'd have to change the signature.
Maybe it's enough that the metics returned will be null
| this.setFilter(get.getFilter()); | ||
| this.setReplicaId(get.getReplicaId()); | ||
| this.setConsistency(get.getConsistency()); | ||
| this.setResultMetricsEnabled(get.isResultMetricsEnabled()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; Thoughts on the name QueryMetrics? I think that ResultMetrics could be interpreted as "metrics about the result" instead of "metrics about the query operation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query metrics makes sense, I mostly just wanted a way to distinguish between the existing scan metrics, and these new metrics
| boolean mayHaveMoreCellsInRow = scannerContext.mayHaveMoreCellsInRow(); | ||
| Result r = Result.create(values, null, stale, mayHaveMoreCellsInRow); | ||
|
|
||
| if (request.getScan().getResultMetricsEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for not directly setting it on the result for scans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, scans are a bit tricky just due to how the results are constructed on the client side. you can take a look at how the response converter builds these responses. They're created from the cells, so we need to send the metrics separately through the wire in this case and hydrate them client-side.
This comment has been minimized.
This comment has been minimized.
3d96194 to
8680147
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I'm actively working through the unit test failures that come up |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| scan.withStartRow(Bytes.toBytes(123)); | ||
| scan.withStopRow(Bytes.toBytes(456)); | ||
| String expectedOutput = | ||
| "{\n" + " \"startTime\": 1,\n" + " \"processingTime\": 2,\n" + " \"queueTime\": 3,\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like adding a field completely changed the order of the serialized fields. all I did here was add the queryMetricsEnabled field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is dumb any could be replaced by literally any modern library for asserts on JSON blobs.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ndimiduk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So high-level question: why is this a new QueryMetrics instead of expanding use of the existing ScanMetrics to all query types? That class even includes the comment,
* Some of these metrics are general for any client operation such as put However, there is no need
* for this. So they are defined under scan operation for now.
My main reasoning was that some metrics exclusively make sense for scans. I thought it'd be confusing to include those metrics in Result objects that were fetched via Get or MultiGet. Another reason for keeping them separate was to make it easier to expand one and not the other in the future. I worry about having one large object will make things harder to extend in the future. |
ndimiduk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes are clean and precise, I have little to say about the execution. However as far as the object model here, I think that you're sitting between two different implementations.
If the plan is to only include a QueryMetrics for Results produced by Gets, I think that you can reduce the scope of this class. Make it a GetMetrics and have it be only applicable to Get-based queries. Push its accessors down to the Get object, remove references from Query and Scan. This approach might become weird because of how Gets are implemented as Scans internally, or maybe it'll be fine -- that's essentially what you have already done here.
On the other hand, you can keep it as a QueryMetrics and also gather this data for Result instances produced by a Scan. Then Scan users can opt-into this metric so as to obtain fine-grained metrics over the internals of the query. You probably also need to update the object model of ScanMetrics, because if a Scan is a Query, then a ScanMetrics should be a QueryMetrics. You'll also need think about whether enabling QueryMetrics also enables ScanMetrics and vice-versa... I expect that there would be a configuration state that enables only ScanMetrics, mimicking the behavior existing today.
Whichever path you choose, you also need to wire this new metrics thing into other parts of the system. Inspecting where ScanMetrics are used will guide you -- MapReduce and PerformanceEvaluation tool(s) should also expose this new feature.
Personally, I think that the ambition of a QueryMetrics is a good one and it maintains the feature parity for users who prefer schemas built around Scans vs. built around Multi-Gets.
| import org.apache.yetus.audience.InterfaceAudience; | ||
|
|
||
| @InterfaceAudience.Public | ||
| public class QueryMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also decorate this class with IS.Unstable or Evolving. Whichever you choose, also place the same annotation on the public accessor methods on all the IA.Public classes. Let's give ourselves a change to explore this feature and make changes faster than once every 5 years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the existing ServerSideMetric class? It has loose typing for metrics via a map and protobuf logic is already there. The only advantage for having QueryMetrics is that the metrics as fields will be more efficient than a map, but it can be improved by defining predefined metric "schemas" for ServerSideMetric. E.g., you can have "QueryMetrics" as one of the schemas and it would predefine a fixed size counters map with "blockBytesScanned" as the only metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServerSideMetric currently has a fixed schema of counters, but this can be changed such that a different schema can be specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We at Salesforce have a similar requirement to track scan metrics at a higher granularity of region level. Currently scan metrics for all the regions scanned are combined together as they are available and so we lose the granularity. Thus, we have a proposal to maintain the per region metrics along with capturing region name and RS where the region lies. A new method will be exposed to retrieve the metrics with granularity, but the existing method will behave the same as it combines all the per region metrics into one structure. We are reusing the existing ScanMetrics class due to following reasons:
- Our use case involves doing scans and it already has support for this class, so this is an incremental change on top of it.
- We are not changing the type of metrics, but simply preserving the granularity. So, we are capturing list of ScanMetric objects, one per each region.
- This allows leveraging the existing integration of scan metrics on server and client side, such as protobuf serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the existing ServerSideMetric class? It has loose typing for metrics via a map and protobuf logic is already there.
I'm hesitant to do this. What's the value in obfuscating the actual metrics themselves by using a generic map when we can efficiently and more clearly represent whichever fields are being served to the user? In my opinion, this is also less error-prone, as the fields are clearly stated, and it's harder to remove or modify something unintentionally.
Thus, we have a proposal to maintain the per region metrics along with capturing region name and RS where the region lies
I talked a little bit about why I opted for creating a new class, rather than re-using an existing one here.
I worry about coupling the new metrics and Scans with this shared metrics class which could make it hard to iterate in the future.
It makes sense to re-use the ScanMetrics for more granular scan metrics, but I think it makes sense to create a new metric type for metrics that will back both Get(s) and Scans. Additionally, the granularity is so course, that we'll be adding bloat to the metric by requiring fields, such as countOfRegions which wouldn't apply to this new type of metric I'm proposing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll still need to introduce this new pattern to enable metrics at the granularity that we want (per-result). So I don't think re-using a base class, and implementing an inheritance hierarchy avoids this. I'm not sure I see the benefit of avoiding a new protobuf definition, or avoiding a new converter. Both are pretty trivial to create, and allow us to keep these concepts (result metrics, vs aggregated scan metrics) separate. Perhaps there's some value in a base metrics class that simply stores counters, it requires a bit of refactoring and touches existing code.
For the sake of discussion, I took a stab at what it would look if we did create this inheritance hierarchy. We can look at this commit to compare both implementations. I'm not we get much value from it, but am happy to continue discussing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Agreed with @hgromer. I think these are two separate things, and there's not a lot of value in shoehorning inheritance. Wishy washy inheritance feels like something we'll wish we hadn't done two or three iterations down the road
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll still need to introduce this new pattern to enable metrics at the granularity that we want (per-result).
I am not sure if I am missing the point, but of course, how else will you get instrumentation other than actually using the class?
So I don't think re-using a base class, and implementing an inheritance hierarchy avoids this.
The suggestion was about reusing the existing pattern by making it more generic, instead of introducing a new class. If we keep adding new standalone classes for every new scenario, it can quickly get confusing. Also, the existing counter map based approach is easier to introspect and process. Just my 2¢.
For the sake of discussion, I took a stab at what it would look if we did create this inheritance hierarchy. We can look at this commit to compare both implementations
Looks good, but I would rename ServerSideMetricsCounter as just MetricsCounter (or MetricsBase) as there is nothing server specific in it. Also, it is not the right place to have those public static constants as it is independent of any specific metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that the original implementation is still far cleaner. We need to add a QueryMetric class regardless; the only thing we save on is an additional protobuf, but I don't think that's a huge benefit. I think generic counters make sense when the schema or shape of the data is abstract, or unknown (such as Configuration), but I think it's neater to leverage explicit typings whenever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted back
| scan.withStartRow(Bytes.toBytes(123)); | ||
| scan.withStopRow(Bytes.toBytes(456)); | ||
| String expectedOutput = | ||
| "{\n" + " \"startTime\": 1,\n" + " \"processingTime\": 2,\n" + " \"queueTime\": 3,\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is dumb any could be replaced by literally any modern library for asserts on JSON blobs.
| Result r = | ||
| Result.create(results, get.isCheckExistenceOnly() ? !results.isEmpty() : null, stale); | ||
| if (get.isQueryMetricsEnabled()) { | ||
| long blockBytesScanned = context.getBlockBytesScanned() - blockBytesScannedBefore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be cleaner if we pushed the blockBytesScanner into the RegionScanner. It could provide an accessor that is updated with each call to next(). This maintenance of a subtractive counter seems odd.
That's probably outside of scope of this PR.
|
Requesting @Apache9 as he's spent a lot of time around the client and I think was involved in the improvements that introduced scanner metrics. |
I think it's nice to implement this behavior for all queries, which is what I've done in this PR. I think it'd be weird to have
Just thinking this through a little bit, this makes sense b/c it enables us to add metrics at a result level granularity specific to either scans or gets. The only thing that gets tricky here is that you need to do class casting to get the exact class you care about. So for example: class QueryMetrics {
public long getBlockBytesScanned();
}
class GetMetrics extends QueryMetrics {
public long getGetSpecificValue();
}
class Result {
public QueryMetrics getQueryMetrics();
}Then, to access the metric class MyDriver() {
public static void main(String[] args) {
Get get = new Get(myRk).setQueryMetricsEnabled(true);
Result r = table.get(get);
GetMetrics = (GetMetrics) r.getQueryMetrics();
long getSpecificValue = r.getGetSpecificValue();
}
}but maybe that's the price to pay for the added flexibility
This PR already opts for keeping the default behavior. The finer-grain query metrics are only returned if
Sounds good, will take a deeper look here |
e06a822 to
c57b3c1
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sorry, I've been away for a while.
I was thinking about whether this could be made easier with generics. I think the answer is yes, but we really don't want to make So I thing the down-casting approach is fine. Maybe there's some helper method that would make it a little safer? I.e., push the cast into methods class Result {
QueryMetrics getQueryMetricsInternal() { ... }
<T extends QueryMetrics> Optional<T> getQueryMetrics() {
return Optional.ofNullable((T) getQueryMetricsInternal());
}
}
Result r = table.get(get);
GetMetrics getMetrics = r.<GetMetrics>getQueryMetrics(); |
|
I don't think it's a requirement that we have strong typing of the |
Sounds good, I've added some javadocs! |
| .action((controller, loc, stub) -> RawAsyncTableImpl.mutate(controller, loc, stub, put, | ||
| (rn, p) -> RequestConverter.buildMutateRequest(rn, row, family, qualifier, op, value, | ||
| null, timeRange, p, HConstants.NO_NONCE, HConstants.NO_NONCE), | ||
| null, timeRange, p, HConstants.NO_NONCE, HConstants.NO_NONCE, false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A carrot to encourage folks to move on :)
This comment has been minimized.
This comment has been minimized.
|
Heya @hgromer the spotless nit looks legitimate. There have been some upstream breakages lately, so to be sure, please rebase and then run |
Just rebased and ran spotless, thanks for the shout |
|
🎊 +1 overall
This message was automatically generated. |
) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
|
Heya @hgromer can you put up a backport for branch-2? We may need some additional unit test to ensure things are plumbed all the way through. Thanks. |
|
🎊 +1 overall
This message was automatically generated. |
) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
… results (apache#6623) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
… results (apache#6623) (#176) Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]> Co-authored-by: Hernan Romer <[email protected]> Co-authored-by: Hernan Gelaf-Romer <[email protected]>
… results (apache#6623) (#176) Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]> Co-authored-by: Hernan Romer <[email protected]> Co-authored-by: Hernan Gelaf-Romer <[email protected]>
) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
No description provided.