Skip to content

[RUNTIME] C++ RPC Server  #1496

@tqchen

Description

@tqchen

So far most of the logic of RPC session handling is in C++. However, we use frontend languages(python,java) to do the basic incoming request handling.

This issue explores if we can move even more logic to C++, to make it possible to create a bare-metal c++ version of RPC, or reuse more logics across language bindings

General Logic of RPC Server

The current RPC is based on TCP socket. The simplest RPC server can do the following loop

  • Listen on a port
  • Accept incoming connection
  • Trap into RPCServerLoop

Fault tolerant version of RPC Server

Because RPC server can be used for tuning, we need to support timeout and possible fault tolerance when running bad programs, there are two ways to support fault tolerance so far

  • Fork-based server(used in python)
    • Master process listens on port and accepts incoming connection
    • Master process fork a child worker process
    • Child worker inherits the socket, trap into RPCServerLoop
    • Because child worker is isolated from master process, master process can kill the child if there is timeout, and can detect if the child crashes
  • Watchdog based server (used in android)
    • Worker process: main thread listens on port and accept incoming connections
    • Worker process: main thread wake up another watchdog thread
    • Worker process: The watchdog will sleep until timeout and call exit(0) to quit the current process if there is time-out
    • Monitor process: the only job of the monitor process is to restart worker process.

Watchdog based server is useful for cases when there is no fork function supported by the system.

Tracker compatible version of RPC Server

In order to be able to run automated optimization with a pool of devices, we want to support reporting to the tracker. Here is a general step

  • RPC server choose a random magic number
  • RPC server reports the current resource to the tracker
  • RPC server listens on the port(normal RPC process)

This way the tracker is aware of the resource and can be used to coordinate things when necessary.

Possible Actionable Items

To make things more portable, I would recommend we implement the following logic in c++

  • Tracker reporting logic
  • Watchdog based fault tolerance logic

Just like the current RPC server, we do not have to build a complete CLI version, instead, we can expose PackedFuncs that makes things easier to do in the CLI, proposals and contribution of code are welcomed.

Please reply if you are interested in working on this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions