elasticsearch_cluster_health_timed_out is a gauge metric.
I was under the impression that its value would oscillate between 0 and 1 depending on whether it can query cluster health API or not.
So in a situation where elasticsearch service has gone down, I was expecting the metric to turn to 1, instead it just goes away.
I could configure alert rules using something like absent(elasticsearch_cluster_health_timed_out) but I think that isn't the right way to do this.
Even the official prometheus' docs recommend avoiding missing metrics.
Here https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics