Atomic record fields #39

gasche · 2024-05-31T14:58:01Z

A proposed design for atomic record fields.

In summary:

(* in the stdlib *)
module Atomic : sig
  ...
  module Field : sig
    type ('r, 'a) t
    val get : 'r -> ('r, 'a) t -> 'a
    val compare_and_set : 'r -> ('r, 'a) t -> 'a -> 'a -> bool
  end
end

(* user code example *)
type t = {
  id : int;
  mutable state : st [@atomic];
}

let rec evolve v =
  let cur = v.state in (* <- strong atomic read *)
  let next = next_state cur in
  if not Atomic.Field.compare_and_set v [%atomic.field state] cur next
  then evolve v

OlivierNicole

I strongly support this, it looks like a good solution for a problem that seemed intractable to me at first glance.

OlivierNicole · 2024-06-03T09:08:08Z

rfcs/atomic_record_fields.md

+For example, if we have `type 'a t = { atomic mutable state : 'a st }`,
+then `[%atomic.field state]` is a polymorphic constant of type
+`('a t, 'a st) Atomic.Field.t`.


I wonder how hard it would be to have a third type parameter, which would be instantiated to a different (opaque) type for every field of a record. This way, when two fields have the same type, it would be impossible to mistakenly access the wrong one.

OlivierNicole · 2024-06-03T09:12:13Z

rfcs/atomic_record_fields.md

+We can expose atomic arrays with the same mechanism: a builtin type
+`'a Atomic.Array.t`, with a primitive function to build an index from
+an integer:


I like the idea of having a uniform interface for atomic fields and arrays. It would also make it natural to use the same runtime primitives for both, reducing the number of primitives to create.

OlivierNicole · 2024-06-04T09:44:22Z

rfcs/atomic_record_fields.md

+We can reuse the `Field.t` type and its atomic-operation API, but note
+that the index here may be outside the bounds of the array, and will
+have to be checked on access. (The index is not bound-checked by the
+`index` function, as it may be called on different arrays later.) For
+`Field.t` values at record type, no such bound checking is necessary.


I’m reminded (by @polytypic) that non-bound-checked accesses would be desirable as well for performance-critical code.

Thinking more about this, I notice that there is a weakness in my proposal related to arrays. When using Atomic.Field.t for records, we do not want any bound-checking, as the indices are in-bounds by construction. So the Atomic.Field.t primitives will not perform any bound-checking. We can reuse the same primitives to implement array access, but the specific proposal I made in the PR leads to non-bound-checked array accesses by default, not to bound-checked array accesses. I think this is wrong: I'm happy to offer both, but bound-checked accesses should be the default.

I see two approaches to solve this:

We could of course expose different primitives in Atomic.Array.t, which would perform bound-checking first and then call the same runtime primitives as those of the Atomic.Field.t API. (More precisely: for each access function, offer a checked and an unchecked version in Atomic.Array.t.) I think that this would be perfectly reasonable, and/but it duplicates the API surface which I thought we could avoid.

We could change our index function to have index : 'a t -> int -> ('a Array.t, 'a) Field.t, and also val unsafe_index : int -> ('a Array.t, 'a) Field.t, wich index in charge of doing bound-checking against a specific array. This is a bit wonky: the index you get has been checked against one array and it is in fact unsafe/unchecked to use it against another array. So only the fragment Atomic.Field.<op> arr (Atomic.Array.index arr i) is safe, where you use the index for indexing right away, on the same array. This avoids the API duplication but the API is a bit weird. This would be acceptable for an advanced, expert-users-seeking-performance-only internal API, but I am not fond of the extra complexity (it is more work to explain and to use), so I would rather support (1).

My gut feeling is thus that the best move would be to remove this paragraph on supporting arrays, which was tentative and does not pan out as I had hoped, and simply bite the bullet and offer a Atomic.Array submodule without any of the field stuff getting involved. (... and we could also have Atomic.ContendedArray if we wanted to, for a fourth copy of the list of primitives.)

I’m reminded (by @polytypic) that non-bound-checked accesses would be desirable as well for performance-critical code.

See Saturn's spsc_queue for example. All accesses are made with an index modulo the size of the array, so there is no need for a bound check.

gasche · 2024-06-04T15:45:47Z

@bclement-ocp has an alternate proposal, where we use an
extension point to denote a (value, offset) pair, instead of
just the offset. This would look like this -- adapting the
previous example:

(* in library code *)
module Atomic : sig
  ...
  module Loc : sig
    type 'a t
    val get : 'a t -> 'a
    val compare_and_set : 'a t -> 'a -> 'a -> bool
  end
end

(* in user code *)
type t = {
  id : int;
  mutable state : st [@atomic];
}

let rec evolve v =
  let cur = v.state in (* <- strong atomic read *)
  let next = next_state cur in
  if not Atomic.Loc.compare_and_set [%atomic.field v.state] cur next
  then evolve v

Upsides of this proposal:

the phantom-typed value has a simpler type: 'a t instead of ('r, 'a) t
in particular, it is possible to expose a single set of primitive for
all of atomic values, atomic record fields
[%atomic.field v.state], and atomic arrays [%atomic.index a.(i)],
[%atomic.unsafe_index a.(i)]

Downside: internally this is represented as a (value, offset)
tuple, so it is inefficient if the tuple ends up being built, in
particular for code of the form
let loc = if ... then [%atomic.index foo] else ....
On the other hand, direct applications to a primitive should be
easy to compile to optimize the tuple construction away.

gasche · 2024-06-13T09:54:14Z

I thought more about the suggestion of @bclement-ocp, and I see four
different designs that are worth discussing.

Running example:

type t = {
  id : int;
  mutable state : st [@atomic];
}

The designs

1. A new two-parameter type `('r, 'v) Atomic.Field.t`

With the current proposal, [%atomic.field state] has type (t, st) Atomic.Field.t. This type ('r, 'v) Field.t represents values of
type 'v inside a larger structure or type 'r.

2. A new one-parameter type `'v Atomic.Loc.t`

This was my initial understanding of the idea of @bclement-ocp: if
foo has type t, then [%atomic.field foo.state] has type int Atomic.Loc.t.

This is an improvement over the two-parameter type because it is
simpler. It avoids some subtleties coming from the first-class-field
nature of the two-parameter proposal, for example any worries related
to type-directed disambiguation is gone.

Internally 'v Atomic.Loc.t is represented as a pair of a value and
an offset, something like Obj.t * int. The construction
[%atomic.field foo.state] desugars to (Obj.obj foo, 1), but the
idea is that when directly applied to a primitive
Atomic.Loc.fetch_and_add (Obj.obj foo, 1) 42 the compiler will
unpack the pair and avoid any allocation. One way to think of the
optimization is that the code gets rewritten into
Atomic.Field.fetch_and_add (Obj.obj foo) 1 42, where Field is now
a secret module, not exposed to users, which includes two-arguments
atomic primitives.

3. Full extension of `'v Atomic.t` to include pointer+offset pairs

In fact we could get rid of this new type 'v Atomic.Loc.t and just
extend 'v Atomic.t for this purpose under the hood:

module Atomic = struct
  type t =
    | Single of { mutable v : 'a }
    | Pair of Obj.t * int
end

The case Single corresponds to the memory representation of atomics
today (the same as ref cells), and the new Pair constructor adds
pointer+offset pairs to it. (This is not type-safe so this definition
would be hidden/private.) With this design, we can give
[%atomic.field foo.state] the type Atomic.t, and we do not need to
introduce a second battery of atomic functions (fetch_and_add
and stuff). The compiler can still optimize Atomic.fetch_and_add [%atomic.field foo.state] to avoid materializing the pair.

I was very happy with this proposal until I noticed a fatal flaw with
it: if you do this, then any call on an atomic that is not
statically known to be a Pair, in particular all atomics that are in
use today, requires a dynamic check to tell if they are in the
Single or in the Pair case. This is a big no-no. (In practice the
test would be cheap, but people who use atomics want precise control
over the code, not dynamic tests added for convenience.)

4. Static extension of `'v Atomic.t`

A variant of proposal (3) is to do as if the type Atomic.t was
extended with a Pair constructor, but statically reject all programs
where Pair would have to be materialized / cannot be optimized away
by the compiler. So either [%atomic.field foo.state] gets passed to
a compiler-optimized Atomic.foo primitive, gets rewritten (morally)
into a two-argument Atomic.Field.foo primitive, or the compiler
rejects the program.

If you do this, then you know that all Atomic.t values reaching the
runtime will in fact use the Single constructor (just like today),
and you do not need to add new dynamic checks.

In other words, with this proposal [%atomic.field foo.state]
pretends to be of type int Atomic.t, but this is a magic
Atomic.t value that will never exist at runtime: it must be passed
to an Atomic.foo primitive directly, which will separate the pointer
from the offset.

My current thinking

I think that (3, full extension) is not acceptable for our users in
practice. I prefer (2, new one-parameter type) and (4, static extension)
to (1, new two-parameter type).

I find it a bit hard to compare (2, new one-parameter type) and (4,
static extension).

The new one-parameter type proposal (2) leads to more duplication in
the API, for reasons that will not be apparent to our users. It
relies less on magic: you want to reason about whether the pair gets
optimized away when writing code, but the code works as expected
(and is in fact decently fast) in any case. Being able to
materialize the pointer+offset pair can also be convenient for some
use-cases, it is a desirable feature to offer to our users.
The proposal (4, static extension) is much less invasive in terms of
API surface, we add [%atomic.field foo.state] and also
{,unsafe_}index : 'a array -> int -> 'a Atomic.t primitives and
that's it. I like the minimality.

I think that we can do a good job of clear error messages in the
cases where the value+offset pair cannot be optimized away, so
I don't think that it will feel hard to use in practice. On the
other hand, it remains more complex (more things to explain), and we
lose the expressivity of being able to dynamically create
pointer+offset pairs and pass them around conveniently.

lpw25 · 2024-06-14T19:13:55Z

Even if you do 2 you might as well define the types like:

type ('r, 'v) field
type 'v loc = Loc : 'r * ('r, 'v) field -> 'v loc

Then people can still get 1 if they like.

OlivierNicole · 2024-07-05T12:20:33Z

I like how (2) fits simply into the compiler, using only things that already exist (extension points, types, compiler primitives). In comparison, (4) seems to add a lot of complexity for a gain that is not so big to me, namely no API duplication. I think we can easily explain to our users that Atomic.t is the simple path that will suffice for many applications, and atomic record fields and atomic accesses in arrays are slightly more advanced and are operated by a distinct set of functions in order to fit nicely into the language.

gasche · 2024-07-05T12:56:59Z

Thanks, this is useful feedback. I agree that (2) is simpler. It's also probably not too hard to implement (4) on top of (2) later if we decide to do it, so in a sense (2) is a minimum viable proposal.

OlivierNicole · 2024-07-05T15:46:24Z

For the record, I disagree with the memory-model label on this RFC, as the proposal is inside the bounds of the current memory model, as long as we make sure that atomic locations remain separate from non-atomic ones.

Speaking of this, I like the fact that atomic arrays can use the same access functions as atomic record fields; but to satisfy the separation of atomic and non-atomic locations, we need to allow atomic locations to be created only from a 'a Atomic.Array.t values, and not from regular arrays. So, as much as it pains me, it seems that there will be a bit of API duplication between Array and Atomic.Array.

Edit.: I just realized that it’s already the case in your proposal.

kayceesrk · 2024-07-05T17:53:25Z

For the record, I disagree with the memory-model label on this RFC, as the proposal is inside the bounds of the current memory model, as long as we make sure that atomic locations remain separate from non-atomic ones.

In my mind, anything to do with atomics, relaxed memory semantics and their compilation fall under the broad category of memory model. Recall that atomics in OCaml were introduced in the PLDI 2017 “memory model” paper. I’m using the tag only to find memory model related RFCs easily.

In trunk, all atomic functions exposed in the runtime are also exposed as language primitives in our intermediate representations (lambda, clambda). But except for `Patomic_load`, which benefits from dedicated code generation, they are all transformed into C calls on all backends. The present PR simplifies the code noticeably by removing the intermediate primitives, by producing C calls directly in lambda/translprim.ml. This reduces the amount of boilerplate to modify to implement atomic record fields (ocaml/RFCs#39). Co-authored-by: Clément Allain <[email protected]>

Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments. In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.) We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields ocaml/RFCs#39 (but there is no language-level support for atomic record fields yet) Co-authored-by: Clément Allain <[email protected]>

This type will be used for ['a Atomic.Loc.t], as proposed in the RFC ocaml/RFCs#39 We implement this here to be able to use it in the stdlib later, after a bootstrap.