Skip to content

Conversation

@paul1r
Copy link
Collaborator

@paul1r paul1r commented Nov 11, 2025

What this PR does / why we need it:
Partition ingesters currently stay in a "not ready" state in terms of Kubernetes health when starting until they have caught up in reading all of their data from Kafka.

This PR decouples the k8s health from the partition readiness, in that the pod should be marked as "healthy", but the partition ingester will not join the ring until the consumption lag has been processed. As the ingester joins the ring, it will briefly be marked as "not ready" while tokens are distributed to the ingester, and becomes ready very quickly after that.

Care was taken to ensure the lifecycle of classic ingesters is not impacted, as well.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@paul1r paul1r requested a review from a team as a code owner November 11, 2025 16:43
@paul1r paul1r marked this pull request as draft November 11, 2025 20:58
@pull-request-size pull-request-size bot added size/L and removed size/M labels Nov 12, 2025
return fmt.Errorf("failed to process consumer lag at startup: %w", err)
}

level.Info(s.logger).Log("msg", "simulating a lot of lag 9")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just test code that won't be in the final PR, but simulates a case where the partition ingester is delayed because of consumption lag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants