Skip to content

Conversation

@gasche
Copy link
Member

@gasche gasche commented Jun 18, 2024

Rendered.

This RFC is more formal than is usual, as it proposes an extension of the current Multicore OCaml memory model to support mixed-atomicity (doing both atomic and non-atomic accesses on the same notation).

The question came up whenever we mentioned Atomic.get_relaxed operations or similar: do we understand what it means? My intuition is that the answer is "meh, that is not much harder than the current model", and this document is an attempt to instantiate this intuition into something more precise.

Being a non-expert on memory models I would warmly welcome expert feedback.

The introduction of the RFC:

The current OCaml memory model is defined in

Bounding Data Races in Space and Time
Stephen Dolan, KC Sivaramakrishnan, Anil Madhavapeddy, 2018
https://kcsrk.info/papers/pldi18-memory.pdf

The model is described by an operational semantics (with non-deterministic memory operations) and an equivalent axiomatic semantics.

The model distinguishes atomic locations and non-atomic locations. Read/write operations on atomic locations are atomic, they induce a synchronization between a reader and the corresponding writer. Read/write operations on non-atomic locations are non-atomic, they induce no synchronization. This separation corresponds to the current OCaml programming model, where 'a Atomic.t values are atomic locations and all other mutable data is non-atomic.

@polytypic expressed interest in adding "fenceless" operations to the Atomic.t type, and his Multicore-magic library exports fenceless_{get,set} operations that are just non-atomic reads and writes. There are many uses of fenceless_get in particular in the kcas library; a common pattern is to use a fenceless_get in read-compute-cas retry loops: we don't need an atomic read as the following compare-and-set will synchronize with past writers anyway, and reading a stale value does not endanger correctness but just provoke a retry.

One question that came up in discussions with @OlivierNicole, @fabbing, @fpottier and @clef-men is whether we know how to describe the resulting model, where both atomic and non-atomic operations may be performed on a given location.

The current RFC, prompted by a discussion with @clef-men, proposes a formal description of such a mixed-atomicity memory model.

(Warning: we are not memory-model experts, so we are just making stuff up.
cc @stedolan, @kayceesrk, @avsm, the authors of the initial model,
and @gmevel, @jhjourdan and @fpottier as the authors of the Cosmo separation logic)

(Note: apparently github supports some LaTeX, but I chose to use weird unicode in verbatim blocks instead, because I think that it will be easier for people to write and edit if they want to, discuss/quote over email, etc.)

@gasche gasche force-pushed the mixed-atomicity-memory-model branch from 4b9eb8c to 02568bb Compare June 18, 2024 14:29
@OlivierNicole
Copy link

We (@fabbing and I) are reading this, slowly.

Copy link

@OlivierNicole OlivierNicole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reading we found a few typos/strange syntax.


e ~<l:ϕ>→ e' S(l); F —<l:ϕ>→ C'; F'
———————————————————————————————————————————————— Step
<S, P || (F, e)]> —→ <S[l ↦ C'], P || (F', e')>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to understand what the double pipe means here. Did you mean P[i ↦ (F, e)]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I strayed slightly from the paper here. In my mind P || Q is standard syntax for parallel composition of processes/threads, which avoids talking about the irrelevant thread index.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought that the stray ] may be a left-over from the previous notation, isn't it?

@stedolan
Copy link
Contributor

"meh, that is not much harder than the current model"

I see the point for Atomic.get_relaxed. Atomic.set_relaxed, on the other hand, is profoundly weird, and there would need to be a better reason than symmetry to add it.

Atomic.set_relaxed breaks coherence of atomic reads

Atomic reads in OCaml satisfy coherence: if another thread writes an atomic and you read it repeatedly, then you will observe only one old->new transition (once you see the new value, further reads will agreee). This is not true of the proposed semantics: if an atomic and nonatomic write race on the same location, then the nonatomic write can choose a higher timestamp then the atomic one without updating the frontier, and atomic readers will then nondeterministically choose between the two, flickering between then arbitrarily many times.

(Underlying hardware is coherent so you might think this saves you from observing such flickering. It does not: see Example 2 in the linked PLDI paper, where this sort of flickering appears due to interactions between compiler and hardware optimisations).

Atomic.set_relaxed breaks the implementation

This assumption from the RFC is false as written:

We assume that the current compilation strategy is enough to realize this memory model, by simply compiling all accesses as we are currently doing.

The OCaml memory model has a property that the proposed semantics preserves, which is that frontiers in threads and locations grow monotonically - information about writes is never forgotten. A consequence of this is that for two atomic writes to the same location, one of them happens before the other, a property that does not hold in most memory models.

This property is not naturally true of hardware models, and OCaml does some work to ensure it. (Roughly, by compiling atomic stores more like atomic exchanges).

The proposed implementation would break this property, as nonatomic writes naively implemented would rewind the frontier of a location, because of weakness of coherence ordering in hardware models. That is, a semantics that is somewhat closer to what the implementation would do is one where Write-NA updates the location's frontier to the minimum of the previous and the thread's frontiers. (This would break the monotonicity and total-ordering-of-atomic-writes properties)

@gasche
Copy link
Member Author

gasche commented Jun 19, 2024

Thanks! It looks like a reasonable next step would be to propose get_relaxed only and forget about set_relaxed for now, at least until we get clear evidence of usefulness/importance of the primitive.

I am planning to close the current PR and open a new one with a revised/simplified proposal, so that we can easily compare the two, disentangle the two discussions, and restart here if we decide to someday.

Co-authored-by: Olivier Nicole <[email protected]>
Co-authored-by: Fabrice Buoro <[email protected]>
@gasche
Copy link
Member Author

gasche commented Jun 19, 2024

In #41 I proposed a restricted version that only allows relaxed reads (not relaxed writes) on memory locations. I am closing here as planned -- but feel free to still comment on the full mixed-atomicity model in the present thread if there is something more to be said.

(Thanks @OlivierNicole and @fabbing for the proof-reading / typos!)

@gasche gasche closed this Jun 19, 2024
@polytypic
Copy link

Thanks! It looks like a reasonable next step would be to propose get_relaxed only and forget about set_relaxed for now, at least until we get clear evidence of usefulness/importance of the primitive.

Yes, I believe get_relaxed is far more important that set_relaxed.

One use case where I have used the equivalent of set_relaxed is in "initialization". In Kcas, for example, a transaction builds a tree-like data structure, step-by-step, that then needs to be "closed" into a cycle (the cyclic data structure is essential for the lock-free algorithm to update multiple locations atomically). All of this happens before multiple threads can see the data structure. It is entirely safe to close/create the cycle by writing to an atomic location in a relaxed manner. The data structure is then made visible to other threads via a sequentially consistent atomic write (which means that the relaxed write cannot be reordered to happen after the sequentially consistent write).

@kayceesrk kayceesrk added the memory-model Memory model related discussions label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

memory-model Memory model related discussions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants