Skip to content

elasticsearch_node_shards_total dynamically adds metrics with new labels #663

@Sebbo94BY

Description

@Sebbo94BY

Version 1.4.0 of this exporter introduced the new metric elasticsearch_node_shards_total, which can be enabled, if required. This was introduced by #535.

I've enabled this in our Elasticsearch setup as we've built some monitoring alerts based on it:

--es.uri='http://127.0.0.1:9200' \
    --web.listen-address=':9112' \
    --es.shards \
    --es.indices_settings

When a node restarts / crashes or whatever and reallocates / moves a shard, this causes the following Prometheus expression...

sum(elasticsearch_node_shards_total{hostname_short=~".*-01"}) by (node)

....to show something like this for example:

metric | value
------------------
{node="elasticsearch-01-01"} | 297
{node="elasticsearch-01-02"} | 291
{node="elasticsearch-01-03"} | 298
{node="elasticsearch-01-04"} | 297
{node="elasticsearch-01-05"} | 297
{node="elasticsearch-01-06"} | 298
{node="elasticsearch-01-06 -> 192.168.2.13 WGCtl2PHSTG-NVXziiUETQ elasticsearch-01-09"} | 1
{node="elasticsearch-01-07"} | 101
{node="elasticsearch-01-08"} | 99
{node="elasticsearch-01-09"} | 107

This dynamically creates new metrics with a unique label like node="elasticsearch-01-06 -> 192.168.2.13 WGCtl2PHSTG-NVXziiUETQ elasticsearch-01-09".

If the cluster should reallocate a lot of shards due to whatever reason, this will result in a lot of new (temporary) metrics, which could lead to metric/label explosions in Prometheus.

It would be great, if those reallocating shard metrics could be turned off or needs to be explicitly enabled to avoid having these metrics at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions