-
Notifications
You must be signed in to change notification settings - Fork 807
Description
Version 1.4.0 of this exporter introduced the new metric elasticsearch_node_shards_total, which can be enabled, if required. This was introduced by #535.
I've enabled this in our Elasticsearch setup as we've built some monitoring alerts based on it:
--es.uri='http://127.0.0.1:9200' \
--web.listen-address=':9112' \
--es.shards \
--es.indices_settings
When a node restarts / crashes or whatever and reallocates / moves a shard, this causes the following Prometheus expression...
sum(elasticsearch_node_shards_total{hostname_short=~".*-01"}) by (node)
....to show something like this for example:
metric | value
------------------
{node="elasticsearch-01-01"} | 297
{node="elasticsearch-01-02"} | 291
{node="elasticsearch-01-03"} | 298
{node="elasticsearch-01-04"} | 297
{node="elasticsearch-01-05"} | 297
{node="elasticsearch-01-06"} | 298
{node="elasticsearch-01-06 -> 192.168.2.13 WGCtl2PHSTG-NVXziiUETQ elasticsearch-01-09"} | 1
{node="elasticsearch-01-07"} | 101
{node="elasticsearch-01-08"} | 99
{node="elasticsearch-01-09"} | 107
This dynamically creates new metrics with a unique label like node="elasticsearch-01-06 -> 192.168.2.13 WGCtl2PHSTG-NVXziiUETQ elasticsearch-01-09".
If the cluster should reallocate a lot of shards due to whatever reason, this will result in a lot of new (temporary) metrics, which could lead to metric/label explosions in Prometheus.
It would be great, if those reallocating shard metrics could be turned off or needs to be explicitly enabled to avoid having these metrics at all.