-
Notifications
You must be signed in to change notification settings - Fork 807
Cleanup and best pratices #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup and best pratices #4
Conversation
elasticsearch_exporter.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewr Are you sure this is a gauge? I couldn't find any documentation about whether this is a gauge or counter. It looks like a counter to me. Same for the filter_cache_evictions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet you're right. Those do sound like counters.
|
Sounds good to me. Here's my counter-argument for node name, but I'm not set one way or the other: the exporter doesn't have to use an Elasticsearch instance on localhost. That's the default argument value, but not required. In fact, the README reflects my original intent that one exporter doesn't even have to just pull one node... If you hit |
|
For discussion: A while back I had tweaked this exporter to roll up all the time and memory metrics into a https://github.com/ewr/elasticsearch_exporter/compare/SummaryMetrics It made for nice graphing compared to having to list out all the different bytes and seconds metrics, but if it's considered a bad pattern I'm not tied to it. |
|
And actually #1 is a PR to implement the all nodes functionality I mentioned. Oops. |
|
@ewr In general I would just run one exporter on every ES instance. If you would run one exporter per cluster, you need somehow make that HA which is more complicated etc. That being said, it could of course support both but I wouldn't recommend running it with cluster wide stats and it makes the code more complicated. Re/ rolling up the metrics, I quote the best practices: |
|
Yeah, graphing I'm not sure I was comprehensively pulling everything, but it did seem to pull together likes with likes. |
|
Re: cluster vs node—My gut is to leave it in. I wouldn't recommend it for production, like you said, but the code complexity is really just one level in the JSON parsing and a different URL. For someone just getting their feet wet, it seems a nice feature. |
|
@ewr I don't think that adding up all the times for example make sense. It's all time but doesn't make that much sense when all added up, right? Same for the memory. It also shows up in the help string you used: 'Roll up of various' kinda shows that it's not that logical thing you can give a name, right? Re: cluster vs node: Since it will otherwise duplicate the instance labels (which is ugly and might collide) we really should omit them for the 'one exporter per node' usecase. That's what would make the code more complicated. |
That's correct, the best way to run this type of exporter is one per node as that keeps the failure domains aligned. Adding labels such as |
|
Well, there's a finite amount of work time per node, so the higher the cumulative rate goes, the more likely it is that the cluster won't be very responsive. Added up its "How much time per second did ES spend working?" |
- Use CamelCase for allStats - Preallocate gauge and counter maps - Rename Uri to URL
Since the exporter returns stats for the local node only, it's not necessary to set this as a label since prometheus will include the instance label to identify the node. If a custom node name is necessary, this can be specified in the prometheus target configuration: http://prometheus.io/docs/operating/configuration/#target-groups-target_group
|
For the labels, I get that running the exporter on the same instance and running one exporter per node is preferred, but I'm a little uneasy about removing that support entirely. For cluster, that's something that you never want to be reporting an incorrect value because your monitoring config is wrong. ES knows its cluster, and it seems like it should be the authority on it. Plus, I don't have a clean way that I would convey that through in my config (which gets generated from Consul service registrations). |
If the time is partitioned over the labels then this could be okay. Is cpu time spent by the process exposed anywhere? That's http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuTime() |
What ES thinks it's name is isn't necessarily what it's name is within the rest of the system - targets should pretty much never chose what their labels are for monitoring. I have a setup where I've two distinct ES clusters with the same internal name for example. |
Good point, we have the same actually. |
|
So I'd be happy with this as it is. We can address 'rolling up' the timers / memory metrics later, same as having a global cluster view. Keep in mind this isn't working right now anyway. Current master also only exposes metrics for the local node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My error, but: this should be "Filter cache memory"
|
I'm 👍 on all of this except the removal of node and cluster. I understand that they aren't needed in the happy case where you're running everything via best practices, but it complicates or removes two potential modes of operation: scraping the whole cluster and scraping a remote node. Is the driver here just that we don't want extra labels being generated when they aren't necessary? Couldn't you overwrite or drop them via relabeling anyway? |
Scraping the cluster is a bit more complicated than adding some labels, you'll also need a special configuration on the Prometheus end to deal with the If you must have the remote node different to the ES node, target discovery does allow controlling the
You could with a feature added earlier in the week, however that feature is intended only as a stopgap until you can fix the exporter. |
|
In the interest of pushing the ball forward on this, I've re-added node and cluster labels and merged everything else. I'm fine to continue talking about those, but I didn't want the other bits stuck in that conversation. |
Hi,
we would like to use the elasticsearch_exporter in our infrastructure. Since I'm also a prometheus contributor and wrote a few exporters, I thought I do frist some cleanup and changes to better follow prometheus and go's best practices.
Beside some minor (but IMO important) changes to mainly naming, the probably biggest change is the removable of the node label for the metrics.
Since the exporter returns stats for the local node only, it's not necessary to set this as a label since prometheus will include the instance label to identify the node. If a custom node name is
necessary, this can be specified in the prometheus target configuration:
http://prometheus.io/docs/operating/configuration/#target-groups-target_group
What do you think?