-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
So far most of the logic of RPC session handling is in C++. However, we use frontend languages(python,java) to do the basic incoming request handling.
This issue explores if we can move even more logic to C++, to make it possible to create a bare-metal c++ version of RPC, or reuse more logics across language bindings
General Logic of RPC Server
The current RPC is based on TCP socket. The simplest RPC server can do the following loop
- Listen on a port
- Accept incoming connection
- Trap into RPCServerLoop
Fault tolerant version of RPC Server
Because RPC server can be used for tuning, we need to support timeout and possible fault tolerance when running bad programs, there are two ways to support fault tolerance so far
- Fork-based server(used in python)
- Master process listens on port and accepts incoming connection
- Master process fork a child worker process
- Child worker inherits the socket, trap into RPCServerLoop
- Because child worker is isolated from master process, master process can kill the child if there is timeout, and can detect if the child crashes
- Watchdog based server (used in android)
- Worker process: main thread listens on port and accept incoming connections
- Worker process: main thread wake up another watchdog thread
- Worker process: The watchdog will sleep until timeout and call exit(0) to quit the current process if there is time-out
- Monitor process: the only job of the monitor process is to restart worker process.
Watchdog based server is useful for cases when there is no fork function supported by the system.
Tracker compatible version of RPC Server
In order to be able to run automated optimization with a pool of devices, we want to support reporting to the tracker. Here is a general step
- RPC server choose a random magic number
- RPC server reports the current resource to the tracker
- RPC server listens on the port(normal RPC process)
This way the tracker is aware of the resource and can be used to coordinate things when necessary.
Possible Actionable Items
To make things more portable, I would recommend we implement the following logic in c++
- Tracker reporting logic
- Watchdog based fault tolerance logic
Just like the current RPC server, we do not have to build a complete CLI version, instead, we can expose PackedFuncs that makes things easier to do in the CLI, proposals and contribution of code are welcomed.
Please reply if you are interested in working on this.