Skip to content

Long-term support for WASIp1 in the toolchain #595

@loganek

Description

@loganek

Do apologize in advance if this is not the right repository for this issue; happy to move it somewhere else
I added an item to the agenda for the next WASI meeting to discuss this topic: WebAssembly/meetings#1543

Context

There are teams, including mine, that support millions of devices running WebAssembly today. The software on these devices is partially updatable. The host native code, including the WASM runtime, is baked into the device's firmware and is either not updatable or can be updated only rarely. The other part is a WebAssembly code running on that runtime, which can be frequently updated. Currently, all of our devices, as well as those of other teams, run the WebAssembly Micro Runtime (WAMR) with only WASIp1 support, and we'll need to support them for an extended period (likely 5+ years). This means that the WASM binary (which we can and want to update frequently) must only use WASIp1 interfaces since the runtime won't support WASIp2 and subsequent releases.

We aim to leverage the most recent version of the toolchain, not only for bug fixes or potential performance improvements but also for new features (primarily memory64, which should have complete support in WAMR very soon, and exception handling). However, as the community is pushing towards WASIp2 and the Component Model, which are not backward compatible with WASIp1, finding a solution to this problem is not straightforward for our team and others in a similar situation.

To address this issue, I am currently considering several options. Some of these options involve moving away from WASI, but they are beyond the scope of our current discussion. The relevant options that enable us to continue using the standard tooling (mainly WASI libc/WASI SDK, but also Rust compiler and others) while addressing the compatibility concerns are as follows:

  1. Support both WASIp1 and WASIp2 in WASI-libc and other tools
  2. Provide WASIp2 → WASIp1 adapter

Support both WASIp1 and WASIp2 in WASI-libc and other tools

The approach has been briefly described in https://github.com/WebAssembly/wasi-libc/pull/476/files, but it was defined as a solution for a "transition" period to enable teams to smoothly migrate from WASIp1 to WASIp2. However, our team does not have a path to transition to WASIp2 at all (at least not in the next few years). Therefore, the proposed "temporary" approach could potentially be used as a permanent solution for our case.

A major disadvantage of this approach is the increased code complexity, which might affect the development of further WASI versions. Looking at the code written for WASIp2 in WASI libc so far, the changes for different WASI versions could be moved to separate files to avoid conflicts and allow for almost independent development of WASIp1 and WASIp2. While having preprocessor directives (#ifdefs) for different versions of WASI may be unavoidable, their impact can be minimized by moving most of the version-specific code to separate files and enabling them conditionally in the build script.

To address these concerns, we propose setting up a continuous integration (CI) system and running a set of tests as part of it to ensure the functionality of WASIp1. We can create a formal group of contributors interested in maintaining WASIp1 and agree on service-level agreements (SLAs) to fix any blocking issues. This group can also support developers focused on WASIp2+ development with any changes that are affected by the existence of WASIp1. By establishing a dedicated CI system and a formal group of contributors focused on WASIp1 maintenance, we can ensure the continued support and stability of WASIp1 while allowing the community to move forward with the development of newer WASI versions.

There has also been pushback from the community about adding new features to WASIp1. As mentioned earlier, we would like to have memory64 support soon, as well as exception handling and potentially other features in the future. I understand the feedback that WASIp1 should be frozen and no longer extended. At the same time, we also need to consider the existing systems already running in production and the business problems that teams must solve today, and come up with reasonable trade-offs. It's important to note that we are not proposing to update the WASI standard itself (although I know @woodsmc had ideas to extend WASIp1 further, but that's topic for a separate discussion), but rather the tooling around it.

Implement WASIp2 → WASIp1 adapter

Another alternative we are considering is to build an adapter that translates (a subset of) the WASIp2 ABI to WASIp1. With this approach, we will use the "wasip2" target in the toolchain but use "wasm-ld" as the linker (instead of the default "wasm-component-ld") so that the output binary is a WASM core module with the WASIp2 core ABI. The adapter will be implemented as a tiny WASM library that will be linked to the WASIp2 binary and implement undefined symbols.

image

This adapter approach would potentially allow us to completely remove WASIp1 from our systems. It also helps address any potential backward incompatibilities between different WASI releases (as long as it's possible to convert calls from one version to another).
I wrote a small prototype for the adapter here: https://github.com/loganek/wasi-snapshot-preview2-to-preview1-adapter/tree/main/wsp2_to_wsp1_adapter
However, there are a few concerns with this approach:

Potential performance impact

C stdlib functions rarely return pointers that’s been allocated by the function itself. The common pattern is to let the caller to provide the buffer and its’ size, e.g.:

size_t fread( void  *buffer, size_t size, size_t count, FILE  *stream );

Where buffer is a caller-provided, already allocated buffer, and size and count define it’s capacity.
A common practice in WIT (and what was already defined for the read function in the WASI-IO proposal) is to return the buffer, e.g.:

read: func(
  /// The maximum number of bytes to read
  len: u64
) → result<list<u8>, stream-error>;

This means the native implementation of the adapter (or runtime) must allocate a memory (using WASM allocator, it’s done using the cabi_realloc WASM export) and return a pointer from the function. This adds additional overhead because:

  1. A new memory must be allocated (even though user of the libc interface already provided the buffer)
  2. The memory must be copied from the adapter-allocated buffer to the caller-provided buffer.

To remove this inefficiency we could provide a custom implementation of the allocator exported to adapter (cabi_realloc). The allocator will have a global thread-local state that will allow libc to set a custom buffer (or a list of buffers), and pointer to that buffer will be returned on the next cabi_realloc call - that way we’ll pass a caller-provided buffer all the way down to the runtime.

image (1)

Because the code of the allocator is rather small, it could likely be inlined to avoid any unnecessary calls and affect the performance.

A prototype of the allocator is here: loganek/wasi-libc@015fa0a. We also have an example usage of that in the adapter’s prototype: https://github.com/loganek/wasi-snapshot-preview2-to-preview1-adapter/blob/main/wsp2_to_wsp1_adapter/wamr/socket.c#L144

Identify accidental use of WASIp2 functions

One major concern with the adapter approach is that WASIp1 is a subset of WASIp2 interfaces. Even after linking to the adapter, the resulting binary may still contain references to WASIp2 interfaces due to the way certain functions are extended in WASI libc for WASIp2. For example, the close() function now calls a WASIp2-specific function (wasi:sockets/[email protected][resource-drop]udp-socket), even if UDP is not used, as the compiler cannot determine file descriptor types at compile-time.

Additionally, since the code will be compiled using the WASIp2 toolchain, developers targeting WASIp1 may accidentally use features not available in WASIp1. While code reviews and testing on WASIp1 runtimes can mitigate this risk, they are not as effective as checking for the introduction of new imports in the binary.

Another related problem can happen when the WASIp2 function that can’t be emulated with WASIp1 is on the call path to another function that can be emulated. For example, we might have a function foo() in WASI libc which is implemented as:

void foo() {
  x(); // WASIp2 function that can't be emulated using WASIp1 interfaces
  y(); // WASIp2 function that can be emulated using WASIp1 interfaces
}

While this could be worked around by providing a dummy implementation of x() in the adapter, the feasibility depends on the semantics of foo(), x(), and y(). However, I’m not sure if such patterns will be observed in the actual implementation of WASI libc.

Summary

While the adapter approach allows for reducing maintenance overhead and potentially opens up opportunities to use standard tooling with non-standard runtime extensions (e.g., sockets in WAMR), the concern of accidental usage of WASIp2 functions is a challenge that needs to be addressed. At the moment, I don't see a strong mitigation for this issue (mentioned code reviews and automated tests are good, but each team would have to write their own set of test cases for their usecase), but I remain open to exploring potential solutions through further discussion and collaboration with the community. Until a suitable solution emerges, my preference is to maintain WASIp1 support in the tooling and extend it with new features as long as there are contributors willing to maintain it and support any potential disruptions for WASIp2+ development. This approach ensures compatibility and support for existing systems while allowing for the gradual adoption of newer WASI versions.

I'm very interested in hearing the community's thoughts on this matter. I'm open to comments or other ideas that could potentially address the problem. I'm also keen to understand the perspectives of the tooling maintainers, as their insights will be valuable in shaping the way forward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions