Skip to content

p2p deadlock on disconnecting a peer twice in quick succession (at least the second time with peer.Disconnect)  #593

@zelig

Description

@zelig

as seen in dump https://gist.github.com/zelig/003203cd282e191a3476 which happens due to 2 addBlock calls causing invalid PoW
@fjl comments:

  • Peer.run has already received some error or disconnect and is waiting for Peer.readLoop to exit.
  • The read loop is waiting for the protocol to read a message.
  • The protocol is waiting for Peer.run to accept the disconnect.
  • But second Disconnect won't return because Peer.run is past the point where it waits for a message on Peer.disc

The general issue is that it's not safe to wait for the protocol without a timeout. The code in peer.go currently assumes that the protocol will always accept messages rather quickly.
It will be less of a problem later when we have concurrent message dispatch (RLPx chunked messages).

default:
        // it's a subprotocol message
        proto, err := p.getProto(msg.Code)
        if err != nil {
            return fmt.Errorf("msg code out of range: %v", msg.Code)
        }
        // ======= this should be a select and exit the loop if the protocol doesn't
        // accept after 5 seconds.
        proto.in <- msg
    }

disconnect by returning from the protocol loop instead of calling Disconnect will fix the specific issue of stalling the blockpool. @zelig

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions