-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
developmentStandard developmentStandard developmentr&d:polykey:core activity 3Peer to Peer Federated HierarchyPeer to Peer Federated Hierarchyr&d:polykey:core activity 4End to End Networking behind Consumer NAT DevicesEnd to End Networking behind Consumer NAT Devices
Description
Specification
The seed node cluster is what is behind mainnet.polykey.io and testnet.polykey.io requires some auto-configuration to gain knowledge of each other so that they can share their DHT workload which include signalling and relaying.
Currently seed nodes are launched without knowledge of any seed nodes. This makes sense, since they are the first seed nodes. However as we scale the number of seed nodes, it would make sense that seed nodes can automatically discover each other and establish connections. This would make easier to launch clusters of seed nodes.
There are several challenges here and questions we must work out:
- Does it mean it is possible to run multiple seed nodes with the same NodeID?
- If we need to have multiple seed nodes, we must then pregenerate their root keys and preserve their recovery codes Testnet securely maintain a pool of recovery codes #285
- If multiple seed nodes have different NodeIDs, are their root keys connected to each other in a trust relationship (either hierarchically via PKI, or a loose-mesh via the gestalt graph + root chain)
- How does this impact how this trust information is propagated eventually-consistently across the network?
- How does this deal with attacks/impersonation/DHT poisoning/sybil...?
- How does this deal with revocation?
- What does this mean for our default seed node list that is configured in the PK software distribution
- If seed nodes are scaled up and down, how do they acquire their recovery keys securely and without conflict?
- When seed nodes need to discover each other automatically, we have to use one of the auto-configuration networking technologies.
- Multicast DNS - Local Network Traversal - Multicast Discovery js-mdns#1
- AWS service discovery
- Should support IPv6 IPv6 Support #400
- https://en.wikipedia.org/wiki/Zero-configuration_networking
- If the seed cluster are all behind 1 IP address/hostname (like our NLB) this means:
- Multiple node ids - multiple host names - multiple IP addresses
- Multiple node ids to 1 IP address
- Multiple node ids to 1 host name
- 1 hostname can resolve to multiple IP addresses (randomly too)
- The same node id on multiple IP addresses and multiple host names
- Testnet Deployment via CI/CD #396 (comment) - discussion on the multi-level complexity of AWS
- Using a network load balancer means we need to preserve stickiness for "flows", we must ensure that this doesn't break down our network connections mid-flight and mid-conversation.
- AWS sets this to 120s timeout for a UDP flow, this is not configurable.
- AWS load balances according to origin IP address, and maintains the stickiness for the lifetime of a flow
- The stickiness must be preserved between NLB to multiple listeners, from listener to multiple target groups, from target group to multiple targets.
- Testnet Deployment via CI/CD #396 (comment) - discussion about how stickiness works on NLB

- Load balancing introduces network proxies. These network proxies must preserve the client IP address, otherwise NAT-busting signalling will not work.
- We've enabled this option for NLB
- There's a special protocol for UDP/TCP for preserving client IPs in case it's not possible to be done at the IP-layer, but must be done on the UDP/TCP layer
Additional context
- Testnet Deployment via CI/CD #396 - initial automation of the testnet.polykey.io discovered these challenges in deploying in AWS
- https://gitlab.com/MatrixAI/Engineering/Polykey/Polykey-Infrastructure/-/issues/6 - recovery code injection from secret managers
- Testnet securely maintain a pool of recovery codes #285 - maintaining recovery keys for the testnet
- https://adam-p.ca/blog/2022/03/x-forwarded-for/ - getting the real IP on layer 7 (note that we are preserving the client IP by default right now, but not all systems do this)
- Cloudflare is becoming more used as the gateway to all polykey services, it's interesting to see that they are becoming that API gateway, and then do add-on services on top... and they skipped the VM and containers and went straight to serverless with WASM. WASM with WASI is the new unikernel system
Tasks
- Research DNS load balancing as an alternative
- Work out how distributed PK with multiple nodes sharing the same IP address will work
- Answer every question above
Metadata
Metadata
Assignees
Labels
developmentStandard developmentStandard developmentr&d:polykey:core activity 3Peer to Peer Federated HierarchyPeer to Peer Federated Hierarchyr&d:polykey:core activity 4End to End Networking behind Consumer NAT DevicesEnd to End Networking behind Consumer NAT Devices