-
Notifications
You must be signed in to change notification settings - Fork 198
Open
Labels
BeatReceiver:MakeItWorkTeam:Elastic-Agent-Data-PlaneLabel for the Agent Data Plane teamLabel for the Agent Data Plane team
Description
The initial implementation for ingesting OTel collector internal telemetry has some major limitations:
- Conversion to Beats-compatible fields for the Agent dashboards is done with a hard-coded Javascript processor, which is not very maintainable or testable.
- Basing the telemetry scrape on Metricbeat's prometheus input means that we can't assume all relevant metrics will be visible in the same event, since this input partitions fields by their label set (and the labels for the relevant fields vary significantly and are not guaranteed to be stable). This rules out support for some important metrics like
output.events.activethat require aggregating data from multiple collector telemetry variables. It also means we have questionable mitigations like mangling the metricbeat id so Elasticsearch doesn't reject events with disjoint variable sets as duplicates just because they have the same timestamp and source metadata. - Some aspects of the config generation and field conversion only work for the monitoring case. Extending them to support the general case would significantly increase the complexity, and might be entirely infeasible with this approach.
- Less severe but still undesirable: this approach requires fetching the data over an open TCP port, even when running in the same process as the collector.
A sustainable solution would have the following attributes:
- Any necessary field conversions can access and logically depend on the full set of Collector telemetry fields (this requires at minimum a custom scraper for the data in Agent and/or Beats instead of adapting the existing Prometheus input).
- Non-trivial field conversions should be written in unit-tested Go code and updated in sync with the Collector version.
- If/when possible, scrape the data directly through in-process mechanisms, or through a private socket rather than generic TCP port.
- This requires upstream changes, either to prometheus (Support Unix socket for metrics address prometheus/prometheus#12024) or the collector.
The concrete design should also allow for a viable migration path to pure OTLP metrics, as definitions and dependencies stabilize enough to reliably build dashboards / integration support on that basis.
Metadata
Metadata
Assignees
Labels
BeatReceiver:MakeItWorkTeam:Elastic-Agent-Data-PlaneLabel for the Agent Data Plane teamLabel for the Agent Data Plane team