- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3
 
Description
Here's an alternative proposal that provides the ability for shared wasm to implement a form of TLS along with access to unshared state from shared contexts.
This is an evolution of ideas that Luke has mentioned, with some tweaks and additions.
There's two parts:
context localsfor wasm/user accessible mutable storage during the lifetime of a wasm stack. This provides a space for values like the shadow stack pointer used by linear memory languages.- Tweaking rules of 
sharedto allow access tounsharedthrough params/locals/etc 
Context locals
The idea here is that we can make available some amount of space in the VM managed context for wasm code to use as mutable storage. The lifetime of values stored there are scoped to an execution stack.
- Introduce a 
contexttype of[vt*] 
- e.g. 
(type (context (field $ssp i32) ...)) - This describes a mutable set of value types available within an execution stack of wasm functions, called 
context locals - We would define some small implementation limit (e.g. 6 types + no SIMD) so that engines can statically allocate the maximum needed space in an easily accessible VM context data structure.
 - To start, all value types must be defaultable (no non-nullable values)
 
- Parameterize function types by a 
contexttype 
- e.g. 
(type (func (context $s) (param i32)) - This defines what kind of context a function must be called on, and what context local variables are available
 - Functions may only call other functions with exactly the same context type
 
- Introduce instructions for accessing the context mutable storage
 
cx_local.get $icx_local.set $i- These instructions use the context type from the function they are in to validate
 
- 
Exported wasm functions called from JS will have context locals default initialized
 - 
Calling an imported JS function will save the context locals and restore it upon JS returning. This allows the imported JS function to call into wasm again and not have context interference.
 - 
Introduce a block instruction for switching the
contexttype for a nested set of instructions 
cx.switch $to : [$to-vars...] -> []- This allows crossing module boundaries that don't agree on the context type by switching it for a scope.
 - Runtime semantics:
- Switches the context to a new type with 
$to-varsas initial context locals - Execute the block body under the new context type
 - Upon exit, switch back to the previous context type and initialize its values by popping the results of the inner block (this avoids a hidden separate stack of values)
 - TBD about unwinding due to exceptions
 
 - Switches the context to a new type with 
 
All together, this extension would allow wasm code fast access to a small set of variables of it's choice without threading it all the way through params.
This should be sufficient for the use-case of shadow stack pointers in linear memory languages, and is useful in the next step below.
Accessing unshared from shared functions
The first step here is to break shared back into two separate concepts for functions, depending on the behavior around stack suspension.
- Plain 
sharedfuncs can only accesssharedmodule state, but they can haveparams,results,localsthat areunshared. The resulting funcref can be shared, can be captured in an unshared suspended stack, but cannot be captured in a shared suspended stack. This is safe because the called func state (locals/etc) are stuck on the same thread, so it can contain unshared things. sharey-shared(term thanks to Conrad) funcs are the same as above, but can only havesharedparams, results, locals. This allows them to be captured in a shared suspended stack that can then be passed between threads.
For shared funcs then, they can access any unshared state that they need through params or the context. You could imagine a context that holds a GC struct that acts as the unshared imports for the module. Any entry point into the module will need to setup the context correctly, but from that point on there is no extra cost.
For sharey-shared funcs then, the details will depend on the nature of a future stack proposal, but here's a rough outline of how it could work. None of this would need to be done until there is a consensus stack proposal.
The first step, is to introduce a shared-barrier block instruction which prevents any suspension as a shared stack (suspending as an unshared stack is permitted) while within this block body. This then switches the validation of this function from sharey-shared to just shared.
The second step is to still allow a sharey-shared func to have a context type that includes unshared state. However, it may not be accessed by the func unless it performs the above shared-barrier.
The final step is to allow a sharey-shared func with unshared context to be suspended into a shared stack, but we define this process to default-initialize any unshared variables within the context. Then we allow the context of a shared stack to be initialized by whatever code is resuming that stack. It can then setup any unshared state to the appropriate value for that thread it is executing on.
With the above, a sharey-shared func can access unshared state by entering a shared-barrier and then accessing the context. It can still be captured in a shared stack and sent to another thread, as long as the other thread knows how to reconstruct the unshared state (which is the assumption for TLS as well).
Comparison with TLS
I believe the above is sufficient to implement the main use cases for TLS I know of:
- Maintaining the shadow stack pointer
 - Maintaining a block of thread-local data (either GC or linear memory)
 - Calling known unshared functions (like console.log) from shared code
 
It does this without introducing any shared -> unshared edges, which is the subject of #34.
It is a lower level feature, so toolchains will need to be responsible to initialize context on module entry points and when resuming shared stacks. But I believe the toolchain can be much smarter about this than engines can.
With engine implemented wasm TLS, engines need to instrument all indirect calls (which are very hot) to observe module crossings and (1) perform a concurrent hash lookup and then (2) possibly lazily create TLS blocks for the instance and all imported TLS. This is not just on crossing into a instance, but also when returning from that instance as the sharey-shared func may have suspended and resumed on a different thread.
SM today already has two call paths for same-instance and cross-instance, but this would make the performance gap significantly worse.
Staging
None of the above would preclude adding engine supported wasm TLS in the future.
context locals seem generally useful and possibly faster storage for certain programs than importing a global. Starting with shared instead of sharey-shared seems reasonable if it gives us an easy way to access unshared state from shared wasm, and we don't even have a consensus stack proposal yet.
So I'd propose we start with the above, and revisit engine supported wasm TLS in the future if it proves necessary and engine architectures have evolved.