-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
bugSomething isn't workingSomething isn't workingr&d:polykey:core activity 3Peer to Peer Federated HierarchyPeer to Peer Federated Hierarchy
Description
Describe the bug
We have several asynchronous background queues, some of which involve establishing grpc connections (node connections) with remote agents. However, if the remote agent we're contacting is destroyed during connection establishment then this will shut down our own agent.
Example:
{"type":"ErrorAgentClientDestroyed","data":{"message":"","timestamp":"2022-07-15T05:59:49.016Z","data":{},"stack":"ErrorAgentClientDestroyed\n at /home/emma/Projects/js-polykey/src/nodes/NodeConnectionManager.ts:567:39\n at /home/emma/Projects/js-polykey/src/nodes/NodeConnectionManager.ts:205:22\n at withF (/home/emma/Projects/js-polykey/node_modules/@matrixai/resources/src/utils.ts:24:18)\n at async constructor_.withConnF (/home/emma/Projects/js-polykey/src/nodes/NodeConnectionManager.ts:197:12)\n at async constructor_.getClosestGlobalNodes (/home/emma/Projects/js-polykey/src/nodes/NodeConnectionManager.ts:495:28)\n at async constructor_.findNode (/home/emma/Projects/js-polykey/src/nodes/NodeConnectionManager.ts:412:8)\n at async constructor_.refreshBucket (/home/emma/Projects/js-polykey/src/nodes/NodeManager.ts:580:5)\n at async constructor_.startRefreshBucketQueue (/home/emma/Projects/js-polykey/src/nodes/NodeManager.ts:711:9)","description":"Agent Client is destroyed","exitCode":64}}
To Reproduce
The timing is quite finicky, but you just need two agents running and you need to kill one of them at the time when the other is doing something like
INFO:NodeConnectionManager:Getting connection to vutea98s5hv7qcde3elv4vc9qpqsv2oph374ql8i0ogiu106nia2g
INFO:NodeConnectionManager:existing entry found for vutea98s5hv7qcde3elv4vc9qpqsv2oph374ql8i0ogiu106nia2g
INFO:NodeConnectionManager:withConnF calling function with connection to vutea98s5hv7qcde3elv4vc9qpqsv2oph374ql8i0ogiu106nia2g
Expected behavior
For operations that are occurring in the background, potentially even without the user being aware of them, this should not cause the agent to shut down. While this behaviour makes sense for an operation the user chose to initiate, background tasks that the user has no control over should not be able to kill the agent,
Tasks
- 1. Review
findNode,pingNode,getClosestGlobalNodes,getRemoteNodeClosestNodesandsyncNodeGraphcan't throw an error due to a connection error. 1 hour - ~ Create tests for PolykeyAgent stability during random agent-agent interactions Polykey-CLI#8 ~
- Any GRPC calls made downstream of GRPC handlers don't need this treatment, any errors will bubble up to the connection and not crash the agent.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingr&d:polykey:core activity 3Peer to Peer Federated HierarchyPeer to Peer Federated Hierarchy