Alternative to thread-local globals and functions

Here's an alternative proposal that provides the ability for shared wasm to implement a form of TLS along with access to unshared state from shared contexts.

This is an evolution of ideas that Luke has mentioned, with some tweaks and additions.

There's two parts:
  1. `context locals` for wasm/user accessible mutable storage during the lifetime of a wasm stack. This provides a space for values like the shadow stack pointer used by linear memory languages.
  2. Tweaking rules of `shared` to allow access to `unshared` through params/locals/etc

## Context locals

The idea here is that we can make available some amount of space in the VM managed context for wasm code to use as mutable storage. The lifetime of values stored there are scoped to an execution stack.

1. Introduce a `context` type of `[vt*]`
  - e.g. `(type (context (field $ssp i32) ...))`
  - This describes a mutable set of value types available within an execution stack of wasm functions, called `context locals`
  - We would define some small implementation limit (e.g. 6 types + no SIMD) so that engines can statically allocate the maximum needed space in an easily accessible VM context data structure.
  - To start, all value types must be defaultable (no non-nullable values)

2. Parameterize function types by a `context` type
  - e.g. `(type (func (context $s) (param i32))`
  - This defines what kind of context a function must be called on, and what context local variables are available
  - Functions may only call other functions with exactly the same context type

3. Introduce instructions for accessing the context mutable storage
  - `cx_local.get $i`
  - `cx_local.set $i`
  - These instructions use the context type from the function they are in to validate

4. Exported wasm functions called from JS will have context locals default initialized

5. Calling an imported JS function will save the context locals and restore it upon JS returning. This allows the imported JS function to call into wasm again and not have context interference.

6. Introduce a block instruction for switching the `context` type for a nested set of instructions
  - `cx.switch $to : [$to-vars...] -> []`
  - This allows crossing module boundaries that don't agree on the context type by switching it for a scope.
  - Runtime semantics:
    1. Switches the context to a new type with `$to-vars` as initial context locals
    2. Execute the block body under the new context type
    3. Upon exit, switch back to the previous context type and initialize its values by popping the results of the inner block (this avoids a hidden separate stack of values)
    4. TBD about unwinding due to exceptions

All together, this extension would allow wasm code fast access to a small set of variables of it's choice without threading it all the way through params.

This should be sufficient for the use-case of shadow stack pointers in linear memory languages, and is useful in the next step below.

## Accessing unshared from shared functions

The first step here is to break `shared` back into two separate concepts for functions, depending on the behavior around stack suspension.

  1. Plain `shared` funcs can only access `shared` module state, but they can have `params`, `results`, `locals` that are `unshared`. The resulting funcref can be shared, can be captured in an unshared suspended stack, but cannot be captured in a shared suspended stack. This is safe because the called func state (locals/etc) are stuck on the same thread, so it can contain unshared things.
  2. `sharey-shared` (term thanks to Conrad) funcs are the same as above, but can only have `shared` params, results, locals. This allows them to be captured in a shared suspended stack that can then be passed between threads.

For `shared` funcs then, they can access any unshared state that they need through `params` or the `context`. You could imagine a `context` that holds a GC struct that acts as the `unshared imports` for the module. Any entry point into the module will need to setup the context correctly, but from that point on there is no extra cost.

For `sharey-shared` funcs then, the details will depend on the nature of a future stack proposal, but here's a rough outline of how it could work. None of this would need to be done until there is a consensus stack proposal.

  The first step, is to introduce a `shared-barrier` block instruction which prevents any suspension as a shared stack (suspending as an unshared stack is permitted) while within this block body. This then switches the validation of this function from `sharey-shared` to just `shared`.
  
  The second step is to still allow a `sharey-shared` func to have a `context` type that includes unshared state. However, it may not be accessed by the func unless it performs the above `shared-barrier`.

  The final step is to allow a `sharey-shared` func with unshared `context` to be suspended into a shared stack, but we define this process to default-initialize any unshared variables within the context. Then we allow the `context` of a shared stack to be initialized by whatever code is resuming that stack. It can then setup any unshared state to the appropriate value for that thread it is executing on.

  With the above, a `sharey-shared` func can access unshared state by entering a `shared-barrier` and then accessing the `context`. It can still be captured in a shared stack and sent to another thread, as long as the other thread knows how to reconstruct the unshared state (which is the assumption for TLS as well).

## Comparison with TLS

I believe the above is sufficient to implement the main use cases for TLS I know of:
  1. Maintaining the shadow stack pointer
  2. Maintaining a block of thread-local data (either GC or linear memory)
  3. Calling known unshared functions (like console.log) from shared code

It does this without introducing any `shared -> unshared` edges, which is the subject of #34.

It is a lower level feature, so toolchains will need to be responsible to initialize context on module entry points and when resuming shared stacks. But I believe the toolchain can be much smarter about this than engines can.

With engine implemented wasm TLS, engines need to instrument all indirect calls (which are very hot) to observe module crossings and (1) perform a concurrent hash lookup and then (2) possibly lazily create TLS blocks for the instance and all imported TLS. This is not just on crossing into a instance, but also when returning from that instance as the `sharey-shared` func may have suspended and resumed on a different thread.

SM today already has two call paths for same-instance and cross-instance, but this would make the performance gap significantly worse.

## Staging

None of the above would preclude adding engine supported wasm TLS in the future.

`context locals` seem generally useful and possibly faster storage for certain programs than importing a global. Starting with `shared` instead of `sharey-shared` seems reasonable if it gives us an easy way to access unshared state from shared wasm, and we don't even have a consensus stack proposal yet.

So I'd propose we start with the above, and revisit engine supported wasm TLS in the future if it proves necessary and engine architectures have evolved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternative to thread-local globals and functions #42

Context locals

Accessing unshared from shared functions

Comparison with TLS

Staging

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Alternative to thread-local globals and functions #42

Description

Context locals

Accessing unshared from shared functions

Comparison with TLS

Staging

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions