-
Notifications
You must be signed in to change notification settings - Fork 751
[GOBBLIN-2209] Emit GaaS Executor Otel Metrics #4118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for emitting OpenTelemetry metrics in Gobblin, including the integration of exponential histogram metrics and long counter metrics via newly introduced classes and activity types.
- Added new activity and helper classes to emit and manage OpenTelemetry metrics (e.g., EmitOTelMetrics, EmitOTelMetricsImpl).
- Updated workflows, workers, and job launchers to incorporate metric emission, and added unit tests for the new metrics functionality.
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
CommitStepWorkflowImpl.java | Added metric emission using EmitOTelMetrics with new attribute assignments. |
WorkFulfillmentWorker.java | Included EmitOTelMetricsImpl in the activity implementation array. |
ExecuteGobblinJobLauncher.java | Integrated metric emission before and after workflow execution, including latency histograms. |
EmitOTelMetricsImpl.java & EmitOTelMetrics.java | New activity interface and implementation for OpenTelemetry metric emission. |
ActivityType.java & GobblinTemporalConfigurationKeys.java | Updated to define new activity type EMIT_OTEL_METRICS and configuration keys. |
Various files under gobblin-metrics | Added support for OpenTelemetry metrics instrumentation, including counters, histograms, and tests. |
ConfigurationKeys.java (gobblin-api) | Updated to include OpenTelemetry-related configuration keys. |
Comments suppressed due to low confidence (1)
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/ddm/workflow/impl/CommitStepWorkflowImpl.java:51
- Consider replacing the literal 'currState' with the constant GaaSOpenTelemetryMetricsConstants.DimensionKeys.CURR_STATE for consistency with the rest of the codebase.
attributes.put("currState", "processWUStart");
...mporal/src/main/java/org/apache/gobblin/temporal/ddm/launcher/ExecuteGobblinJobLauncher.java
Outdated
Show resolved
Hide resolved
@Getter | ||
@AllArgsConstructor | ||
public enum GaaSOpenTelemetryMetrics { | ||
GAAS_JOB_STATUS("gaas_job_status", "Gaas job status counter", "1", OpenTelemetryMetricType.LONG_COUNTER), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1
is metric unit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly s
in the below one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 is default unit here , although we should use curly annotation as mentioned in wiki but have kept default for simplicity
s represents second to measure latency metrics
https://opentelemetry.io/docs/specs/semconv/general/metrics/
...-metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/OpenTelemetryMetricType.java
Outdated
Show resolved
Hide resolved
...blin-metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/OpenTelemetryHelper.java
Show resolved
Hide resolved
...metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/GaaSOpenTelemetryMetrics.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/gobblin/metrics/opentelemetry/GaaSOpenTelemetryMetricsConstants.java
Outdated
Show resolved
Hide resolved
String openTelemetryClassName = state.getProp(ConfigurationKeys.METRICS_REPORTING_OPENTELEMETRY_CLASSNAME, | ||
ConfigurationKeys.DEFAULT_METRICS_REPORTING_OPENTELEMETRY_CLASSNAME); | ||
Class<?> metricsClass = Class.forName(openTelemetryClassName); | ||
Method getInstanceMethod = metricsClass.getMethod("getInstance", State.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use MethodUtils helper class instead of invoking methods ourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do but this is the standard pattern being followed across codebase
Line 92 in ddd9468
Class<?> datasetFinderClass = Class.forName(className); |
name, | ||
attrs, | ||
this.meter.histogramBuilder(name) | ||
.setDescription(metric.getMetricDescription()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description
and unit
can be used directly
...ics/src/main/java/org/apache/gobblin/metrics/opentelemetry/OpenTelemetryInstrumentation.java
Outdated
Show resolved
Hide resolved
log.info("FINISHED - ExecuteGobblinWorkflow.execute = {}", execGobblinStats); | ||
attributes.put(CURR_STATE, JOB_COMPLETE); | ||
emitOTelMetrics.emitLongCounterMetric(GaaSOpenTelemetryMetrics.GAAS_JOB_STATUS, 1L, attributes, finalProps); | ||
attributes.remove(CURR_STATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please elaborate on why are we doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing specific, just avoiding creating a new Map object
...oral/src/main/java/org/apache/gobblin/temporal/ddm/workflow/impl/CommitStepWorkflowImpl.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/gobblin/temporal/ddm/workflow/impl/ProcessWorkUnitsWorkflowImpl.java
Show resolved
Hide resolved
...metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/GaaSOpenTelemetryMetrics.java
Outdated
Show resolved
Hide resolved
public static final String STATE = "state"; | ||
public static final String CURR_STATE = "currState"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between state
and currState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State -- GenWU, ProcessWU, CommitStep
CurrState -- GenWUStart, GenWUComplete, ...
...metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/GaaSOpenTelemetryMetrics.java
Outdated
Show resolved
Hide resolved
...blin-metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/OpenTelemetryHelper.java
Outdated
Show resolved
Hide resolved
...ics/src/main/java/org/apache/gobblin/metrics/opentelemetry/OpenTelemetryDoubleHistogram.java
Outdated
Show resolved
Hide resolved
...blin-metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/OpenTelemetryHelper.java
Show resolved
Hide resolved
public OpenTelemetryMetric getOrCreate(GaaSOpenTelemetryMetrics metric) { | ||
return this.metrics.computeIfAbsent(metric.getMetricName(), name -> createMetric(metric)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the metrics map uses only metricName as key. If two metrics share the same name but differ in type (eg, LONG_COUNTER vs DOUBLE_HISTOGRAM), this can silently cause incorrect caching behavior. It would be better to use a composite key like metricName + "_" + metricType to avoid collisions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two attributes, two metrics, or two events MUST NOT share the same name. Different entities (attribute and metric, metric and event) MAY share the same name.
https://opentelemetry.io/docs/specs/semconv/general/naming/#name-reuse-prohibition
...metrics/src/main/java/org/apache/gobblin/metrics/opentelemetry/OpenTelemetryLongCounter.java
Outdated
Show resolved
Hide resolved
|
||
@Getter | ||
@AllArgsConstructor | ||
public enum GaaSOpenTelemetryMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move the metric creation logic into the GaaSOpenTelemetryMetrics enum itself instead of centralizing it in OpenTelemetryInstrumentation.createMetric.
Currently, the enum acts as a metadata holder, while the actual instantiation logic(via switch-case) is external and hence scattered. This becomes harder to extend if we introduce new metric types(eg gauges, timers, etc). A more extensible approach would be to let each enum constant hold a factory method and expose a createMetric() method
d96262a
to
832a86c
Compare
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Here are some details about my PR, including screenshots (if applicable):
Added change to
OpenTelemetryMetrics
to emit ExponentialHistogramMetricsCreated
package org.apache.gobblin.metrics.opentelemetry;
which contains different classes required for OTelMetrics emissionTests
My PR adds the following unit tests OR does not need testing for this extremely good reason:
OpenTelemetryHelperTest
OpenTelemetryInstrumentationTest
OpenTelemetryMetricTest
Commits