Skip to content

Conversation

@karalabe
Copy link
Member

@karalabe karalabe commented Apr 15, 2021

Snap sync has a run loop. All events are passed though this, which ensures race free consistency. This run loop can be stopped and later restarted if for example the pivot block moves.

Unfortunately, the run loop used persistent channels across sync cycles. Whilst this seemed like a good and clean idea, this means extra care needs to be taken to not leak "events" from one cycle into the next. The dirty-snapshot PR surfaced an interesting data race where a very quick stop/restart cycle ended up delivering stale events to the new sync. Since events assume certain invariants, such stale deliveries crashed.

This PR replaces the persistent channels with ephemeral ones, new for every sync cycle. This way even if a great deal of events are piled up, nobody will consume them upon a sync restart and they will just get dropped on the floor via the cancel channel.

An alternative approach would be some multi-stage shutdown where we first disable accepting new packets and then wait for all active threads to terminate. That seems overly complicated though and a lot more things can be messed up with the complex sync logic.

@karalabe karalabe added this to the 1.10.3 milestone Apr 15, 2021
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct, although it would be neat if we could somehow use some nicer abstraction to reduce the LOC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants