-
Notifications
You must be signed in to change notification settings - Fork 332
snapshots: disable peer selection when no snapshot sources are selected #6594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7c47c6f
to
972fa5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we need this config option. It seems like we can just use the combination of
- entrypoints disabled
- all gossip peers disabled
- http servers empty/disabled
to mean the same thing. We should support this in production anyway (the operator is saying they will provide their own snapshot on disk)
f58c238
to
0420e31
Compare
223405d
to
0c3c629
Compare
0c3c629
to
28a5ecc
Compare
FD_LOG_ERR(( "Local snapshot `%s` is too old and downloading new snapshots is disabled. " | ||
"Please enable downloading via [snapshots.download] and restart.", ctx->local_in.full_snapshot_path ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any existing precedent for LOG_ERR'ing here? This is a situation where the validator isn't expecting to make progress / start up correctly, but it's not really a "crashable" edge case...
So is there some precedent for "alert loudly and keep running" or do we prefer to crash? Presumably most operators have their validators running under systemd or whatnot which will auto restart if it crashes/stops, which won't give them any chance to correct the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately not sure how I feel about asking the operator to enable downloading. Maybe they just need to get a newer snapshot some other way external to our code. We should also probably log the local slot, cluster slot, and the max slot age so they can see those things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with logging the local slot, cluster slot, and the max slot age. Not sure if we have precedent for alert loudly and keep going, or crashing. We added this download config option because agave had something similar: no_snapshot_fetch
, which disabled downloading snapshots and only loaded from local snapshots if present.
No description provided.