Skip to content

Conversation

cali-jumptrading
Copy link
Contributor

@cali-jumptrading cali-jumptrading commented Oct 2, 2025

No description provided.

@cali-jumptrading cali-jumptrading force-pushed the cali/snapshot-tiles-http-resolver-2 branch 2 times, most recently from 7c47c6f to 972fa5d Compare October 2, 2025 19:30
@cali-jumptrading cali-jumptrading marked this pull request as ready for review October 2, 2025 19:30
Copy link
Contributor

@amass-jump amass-jump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need this config option. It seems like we can just use the combination of

  • entrypoints disabled
  • all gossip peers disabled
  • http servers empty/disabled

to mean the same thing. We should support this in production anyway (the operator is saying they will provide their own snapshot on disk)

@cali-jumptrading cali-jumptrading force-pushed the cali/snapshot-tiles-http-resolver-2 branch 4 times, most recently from f58c238 to 0420e31 Compare October 7, 2025 17:16
@cali-jumptrading cali-jumptrading changed the title snapshots: proper development option to disable peer selection snapshots: disable peer selection when no snapshot sources are selected Oct 7, 2025
@cali-jumptrading cali-jumptrading force-pushed the cali/snapshot-tiles-http-resolver-2 branch 3 times, most recently from 223405d to 0c3c629 Compare October 7, 2025 17:43
@cali-jumptrading cali-jumptrading force-pushed the cali/snapshot-tiles-http-resolver-2 branch from 0c3c629 to 28a5ecc Compare October 7, 2025 17:52
Comment on lines +749 to +750
FD_LOG_ERR(( "Local snapshot `%s` is too old and downloading new snapshots is disabled. "
"Please enable downloading via [snapshots.download] and restart.", ctx->local_in.full_snapshot_path ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any existing precedent for LOG_ERR'ing here? This is a situation where the validator isn't expecting to make progress / start up correctly, but it's not really a "crashable" edge case...

So is there some precedent for "alert loudly and keep running" or do we prefer to crash? Presumably most operators have their validators running under systemd or whatnot which will auto restart if it crashes/stops, which won't give them any chance to correct the config

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately not sure how I feel about asking the operator to enable downloading. Maybe they just need to get a newer snapshot some other way external to our code. We should also probably log the local slot, cluster slot, and the max slot age so they can see those things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with logging the local slot, cluster slot, and the max slot age. Not sure if we have precedent for alert loudly and keep going, or crashing. We added this download config option because agave had something similar: no_snapshot_fetch, which disabled downloading snapshots and only loaded from local snapshots if present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants