|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +===================== |
| 4 | +Multipath TCP (MPTCP) |
| 5 | +===================== |
| 6 | + |
| 7 | +Introduction |
| 8 | +============ |
| 9 | + |
| 10 | +Multipath TCP or MPTCP is an extension to the standard TCP and is described in |
| 11 | +`RFC 8684 (MPTCPv1) <https://www.rfc-editor.org/rfc/rfc8684.html>`_. It allows a |
| 12 | +device to make use of multiple interfaces at once to send and receive TCP |
| 13 | +packets over a single MPTCP connection. MPTCP can aggregate the bandwidth of |
| 14 | +multiple interfaces or prefer the one with the lowest latency, it also allows a |
| 15 | +fail-over if one path is down, and the traffic is seamlessly reinjected on other |
| 16 | +paths. |
| 17 | + |
| 18 | +For more details about Multipath TCP in the Linux kernel, please see the |
| 19 | +official website: `mptcp.dev <https://www.mptcp.dev>`. |
| 20 | + |
| 21 | + |
| 22 | +Use cases |
| 23 | +========= |
| 24 | + |
| 25 | +Thanks to MPTCP, being able to use multiple paths in parallel or simultaneously |
| 26 | +brings new use-cases, compared to TCP: |
| 27 | + |
| 28 | +- Seamless handovers: switching from one path to another while preserving |
| 29 | + established connections, e.g. to be used in mobility use-cases, like on |
| 30 | + smartphones. |
| 31 | +- Best network selection: using the "best" available path depending on some |
| 32 | + conditions, e.g. latency, losses, cost, bandwidth, etc. |
| 33 | +- Network aggregation: using multiple paths at the same time to have a higher |
| 34 | + throughput, e.g. to combine fixed and mobile networks to send files faster. |
| 35 | + |
| 36 | + |
| 37 | +Concepts |
| 38 | +======== |
| 39 | + |
| 40 | +Technically, when a new socket is created with the ``IPPROTO_MPTCP`` protocol |
| 41 | +(Linux-specific), a *subflow* (or *path*) is created. This *subflow* consists of |
| 42 | +a regular TCP connection that is used to transmit data through one interface. |
| 43 | +Additional *subflows* can be negotiated later between the hosts. For the remote |
| 44 | +host to be able to detect the use of MPTCP, a new field is added to the TCP |
| 45 | +*option* field of the underlying TCP *subflow*. This field contains, amongst |
| 46 | +other things, a ``MP_CAPABLE`` option that tells the other host to use MPTCP if |
| 47 | +it is supported. If the remote host or any middlebox in between does not support |
| 48 | +it, the returned ``SYN+ACK`` packet will not contain MPTCP options in the TCP |
| 49 | +*option* field. In that case, the connection will be "downgraded" to plain TCP, |
| 50 | +and it will continue with a single path. |
| 51 | + |
| 52 | +This behavior is made possible by two internal components: the path manager, and |
| 53 | +the packet scheduler. |
| 54 | + |
| 55 | +Path Manager |
| 56 | +------------ |
| 57 | + |
| 58 | +The Path Manager is in charge of *subflows*, from creation to deletion, and also |
| 59 | +address announcements. Typically, it is the client side that initiates subflows, |
| 60 | +and the server side that announces additional addresses via the ``ADD_ADDR`` and |
| 61 | +``REMOVE_ADDR`` options. |
| 62 | + |
| 63 | +Path managers are controlled by the ``net.mptcp.pm_type`` sysctl knob -- see |
| 64 | +mptcp-sysctl.rst. There are two types: the in-kernel one (type ``0``) where the |
| 65 | +same rules are applied for all the connections (see: ``ip mptcp``) ; and the |
| 66 | +userspace one (type ``1``), controlled by a userspace daemon (i.e. `mptcpd |
| 67 | +<https://mptcpd.mptcp.dev/>`_) where different rules can be applied for each |
| 68 | +connection. The path managers can be controlled via a Netlink API, see |
| 69 | +netlink_spec/mptcp_pm.rst. |
| 70 | + |
| 71 | +To be able to use multiple IP addresses on a host to create multiple *subflows* |
| 72 | +(paths), the default in-kernel MPTCP path-manager needs to know which IP |
| 73 | +addresses can be used. This can be configured with ``ip mptcp endpoint`` for |
| 74 | +example. |
| 75 | + |
| 76 | +Packet Scheduler |
| 77 | +---------------- |
| 78 | + |
| 79 | +The Packet Scheduler is in charge of selecting which available *subflow(s)* to |
| 80 | +use to send the next data packet. It can decide to maximize the use of the |
| 81 | +available bandwidth, only to pick the path with the lower latency, or any other |
| 82 | +policy depending on the configuration. |
| 83 | + |
| 84 | +Packet schedulers are controlled by the ``net.mptcp.scheduler`` sysctl knob -- |
| 85 | +see mptcp-sysctl.rst. |
| 86 | + |
| 87 | + |
| 88 | +Sockets API |
| 89 | +=========== |
| 90 | + |
| 91 | +Creating MPTCP sockets |
| 92 | +---------------------- |
| 93 | + |
| 94 | +On Linux, MPTCP can be used by selecting MPTCP instead of TCP when creating the |
| 95 | +``socket``: |
| 96 | + |
| 97 | +.. code-block:: C |
| 98 | +
|
| 99 | + int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP); |
| 100 | +
|
| 101 | +Note that ``IPPROTO_MPTCP`` is defined as ``262``. |
| 102 | + |
| 103 | +If MPTCP is not supported, ``errno`` will be set to: |
| 104 | + |
| 105 | +- ``EINVAL``: (*Invalid argument*): MPTCP is not available, on kernels < 5.6. |
| 106 | +- ``EPROTONOSUPPORT`` (*Protocol not supported*): MPTCP has not been compiled, |
| 107 | + on kernels >= v5.6. |
| 108 | +- ``ENOPROTOOPT`` (*Protocol not available*): MPTCP has been disabled using |
| 109 | + ``net.mptcp.enabled`` sysctl knob, see mptcp-sysctl.rst. |
| 110 | + |
| 111 | +MPTCP is then opt-in: applications need to explicitly request it. Note that |
| 112 | +applications can be forced to use MPTCP with different techniques, e.g. |
| 113 | +``LD_PRELOAD`` (see ``mptcpize``), eBPF (see ``mptcpify``), SystemTAP, |
| 114 | +``GODEBUG`` (``GODEBUG=multipathtcp=1``), etc. |
| 115 | + |
| 116 | +Switching to ``IPPROTO_MPTCP`` instead of ``IPPROTO_TCP`` should be as |
| 117 | +transparent as possible for the userspace applications. |
| 118 | + |
| 119 | +Socket options |
| 120 | +-------------- |
| 121 | + |
| 122 | +MPTCP supports most socket options handled by TCP. It is possible some less |
| 123 | +common options are not supported, but contributions are welcome. |
| 124 | + |
| 125 | +Generally, the same value is propagated to all subflows, including the ones |
| 126 | +created after the calls to ``setsockopt()``. eBPF can be used to set different |
| 127 | +values per subflow. |
| 128 | + |
| 129 | +There are some MPTCP specific socket options at the ``SOL_MPTCP`` (284) level to |
| 130 | +retrieve info. They fill the ``optval`` buffer of the ``getsockopt()`` system |
| 131 | +call: |
| 132 | + |
| 133 | +- ``MPTCP_INFO``: Uses ``struct mptcp_info``. |
| 134 | +- ``MPTCP_TCPINFO``: Uses ``struct mptcp_subflow_data``, followed by an array of |
| 135 | + ``struct tcp_info``. |
| 136 | +- ``MPTCP_SUBFLOW_ADDRS``: Uses ``struct mptcp_subflow_data``, followed by an |
| 137 | + array of ``mptcp_subflow_addrs``. |
| 138 | +- ``MPTCP_FULL_INFO``: Uses ``struct mptcp_full_info``, with one pointer to an |
| 139 | + array of ``struct mptcp_subflow_info`` (including the |
| 140 | + ``struct mptcp_subflow_addrs``), and one pointer to an array of |
| 141 | + ``struct tcp_info``, followed by the content of ``struct mptcp_info``. |
| 142 | + |
| 143 | +Note that at the TCP level, ``TCP_IS_MPTCP`` socket option can be used to know |
| 144 | +if MPTCP is currently being used: the value will be set to 1 if it is. |
| 145 | + |
| 146 | + |
| 147 | +Design choices |
| 148 | +============== |
| 149 | + |
| 150 | +A new socket type has been added for MPTCP for the userspace-facing socket. The |
| 151 | +kernel is in charge of creating subflow sockets: they are TCP sockets where the |
| 152 | +behavior is modified using TCP-ULP. |
| 153 | + |
| 154 | +MPTCP listen sockets will create "plain" *accepted* TCP sockets if the |
| 155 | +connection request from the client didn't ask for MPTCP, making the performance |
| 156 | +impact minimal when MPTCP is enabled by default. |
0 commit comments