| 
1 |  | -# Data structure representation  | 
 | 1 | +# Data structure representation and validity requirements  | 
2 | 2 | 
 
  | 
3 |  | -In general, Rust makes few guarantees about memory layout, unless you  | 
4 |  | -define your structs as `#[repr(rust)]`. But there are some things that  | 
5 |  | -we do guarantee. Let's write about them.  | 
 | 3 | +## Introduction  | 
6 | 4 | 
 
  | 
7 |  | -TODO:  | 
 | 5 | +This discussion is meant to focus on the following things:  | 
8 | 6 | 
 
  | 
9 |  | -- Find and link to the various RFCs  | 
10 |  | -- Enumerate things that we *might* in fact guarantee, even for non-C types:  | 
11 |  | -  - e.g., `&T` and `Option<&T>` are both pointer sized  | 
12 |  | -  - size of `extern fn` etc (at least on some platforms)?  | 
13 |  | -  - For which `T` is `None` represented as a "null pointer" etc?  | 
14 |  | -    - (Which "niche" optimizations can we rely on)  | 
 | 7 | +- What guarantees does Rust make regarding the layout of data structures?  | 
 | 8 | +- What guarantees does Rust make regarding ABI compatibility?  | 
 | 9 | + | 
 | 10 | +NB. Oftentimes, choices of layout will only be possible if we can  | 
 | 11 | +guarantee various invariants -- this is particularly true when  | 
 | 12 | +optimizing the layout of `Option` or other enums. However, designing  | 
 | 13 | +those invariants is left for a future discussion -- here, we should  | 
 | 14 | +document/describe what we currently do and/or aim to support.  | 
 | 15 | + | 
 | 16 | +### Layout of data structures  | 
 | 17 | + | 
 | 18 | +In general, Rust makes few guarantees about the memory layout of your  | 
 | 19 | +structures. For example, by default, the compiler has the freedom to  | 
 | 20 | +rearrange the field order of your structures for more efficiency (as  | 
 | 21 | +of this writing, we try to minimize the overall size of your  | 
 | 22 | +structure, but this is the sort of detail that can easily change). For  | 
 | 23 | +safe code, of course, any rearrangements "just work" transparently.  | 
 | 24 | + | 
 | 25 | +If, however, you need to write unsafe code, you may wish to have a  | 
 | 26 | +fixed data structure layout. In that case, there are ways to specify  | 
 | 27 | +and control how an individual struct will be laid out -- notably with  | 
 | 28 | +`#[repr]` annotations. One purpose of this section, then, is to layout  | 
 | 29 | +what sorts of guarantees we offer when it comes to layout, and also  | 
 | 30 | +what effect the various `#[repr]` annotations have.  | 
 | 31 | + | 
 | 32 | +### ABI compatibilty  | 
 | 33 | + | 
 | 34 | +When one either calls a foreign function or is called by one, extra  | 
 | 35 | +care is needed to ensure that all the ABI details line up. ABI compatibility  | 
 | 36 | +is related to data structure layout but -- in some cases -- can add another  | 
 | 37 | +layer of complexity. For example, consider a struct with one field, like this one:  | 
 | 38 | + | 
 | 39 | +```rust  | 
 | 40 | +#[repr(C)]  | 
 | 41 | +struct Foo { field: u32 }  | 
 | 42 | +```  | 
 | 43 | + | 
 | 44 | +The memory layout of `Foo` is identical to a `u32`. But in many ABIs,  | 
 | 45 | +the struct type `Foo` is treated differently at the point of a  | 
 | 46 | +function call than a `u32` would be. Eliminating these gaps is the  | 
 | 47 | +goal of the `#[repr(transparent)]` annotation introduced in [RFC  | 
 | 48 | +1758]. For built-in types, such as `&T` and so forth, it is important  | 
 | 49 | +for us to specify how they are treated at the point of a function  | 
 | 50 | +call.  | 
 | 51 | + | 
 | 52 | +## Goals  | 
 | 53 | + | 
 | 54 | +- Document current behavior of compiler.  | 
 | 55 | +  - Indicate which behavior is "permitted" for compiler and which  | 
 | 56 | +    aspects are things that unsafe code can rely upon.  | 
 | 57 | +  - Include the effect of `#[repr]` annotations.  | 
 | 58 | +- Uncover the sorts of layout optimizations we may wish to do in the  | 
 | 59 | +  future.  | 
 | 60 | + | 
 | 61 | +## Some interesting examples and questions  | 
 | 62 | + | 
 | 63 | +- `&T` where `T: Sized`  | 
 | 64 | +  - This is **guaranteed** to be a non-null pointer  | 
 | 65 | +- `Option<&T>` where `T: Sized`  | 
 | 66 | +  - This is **guaranteed** to be a nullable pointer  | 
 | 67 | +- `Option<extern "C" fn()>`  | 
 | 68 | +  - Can this be assumed to be a non-null pointer?  | 
 | 69 | +- `usize`  | 
 | 70 | +  - Platform dependent size, but guaranteed to be able to store a pointer?  | 
 | 71 | +  - Also an array length?  | 
 | 72 | +- Uninitialized bits -- for which types are uninitialized bits valid?  | 
 | 73 | +- If you have `struct A { .. }` and `struct B { .. }` with no  | 
 | 74 | +  `#[repr]` annotations, and they have the same field types, can we  | 
 | 75 | +  say that they will have the same layout?  | 
 | 76 | +  - or do we have the freedom to rearrange the types of `A` but not  | 
 | 77 | +    `B`, e.g. based on PGO results  | 
 | 78 | +  - What about different instantiations of the same struct? (`Vec<A>`  | 
 | 79 | +    vs `Vec<B>`)  | 
 | 80 | +- Rust currently says that no single value may be larger than `isize` bytes  | 
 | 81 | +  - is this good? can it be changed? does it matter *here* anyway?  | 
 | 82 | + | 
 | 83 | +## Active threads  | 
 | 84 | + | 
 | 85 | +To start, we will create threads for each major categories of types  | 
 | 86 | +(with a few suggested focus points):  | 
 | 87 | + | 
 | 88 | +- Integers and floating points  | 
 | 89 | +    - What about signaling NaN etc? ([Seems like a  | 
 | 90 | +      non-issue](https://github.com/rust-lang/rust/issues/40470#issuecomment-343803381),  | 
 | 91 | +      but it'd be good to resummarize the details).  | 
 | 92 | +    - is `usize` the native size of a pointer? [the max of various other considerations](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212702266)?  | 
 | 93 | +      what are edge cases here?  | 
 | 94 | +    - Rust currently states that the maximum size of any single value must fit in with `isize`  | 
 | 95 | +      - Can we say a bit more about why? (e.g., [ensuring that "pointer diff" is representable](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212703192)  | 
 | 96 | +- Booleans  | 
 | 97 | +    - Prior discussions ([#46156][], [#46176][]) documented bool as a single  | 
 | 98 | +      byte that is either 0 or 1.  | 
 | 99 | +- Enums  | 
 | 100 | +    - See dedicated thread about "niches" and `Option`-style layout optimization  | 
 | 101 | +      below.  | 
 | 102 | +    - Define: C-like enum  | 
 | 103 | +    - Can a C-like enum ever have an invalid discriminant? (Presumably not)  | 
 | 104 | +    - Empty enums and the `!` type  | 
 | 105 | +    - [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads.  | 
 | 106 | +    - [RFC 2363][] offers a proposal to permit specifying discriminations.  | 
 | 107 | +- Structs  | 
 | 108 | +    - Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out  | 
 | 109 | +      (and/or treated by the ABI)?  | 
 | 110 | +      - e.g., what about different structs with same definition  | 
 | 111 | +      - across executions of the same program?  | 
 | 112 | +    - For example, [rkruppe  | 
 | 113 | +      writes](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212776247)  | 
 | 114 | +      that we might "want to guarantee (some subset of) newtype  | 
 | 115 | +      unpacking and relegate `#[repr(transparent)]` to being the way  | 
 | 116 | +      to guarantee to other crates that a type with private fields is  | 
 | 117 | +      and will remain a newtype?"  | 
 | 118 | +- Tuples  | 
 | 119 | +    - Are these effectively anonymous structs?   | 
 | 120 | +- Unions  | 
 | 121 | +    - Can we ever say anything about the initialized contents of a union?  | 
 | 122 | +    - Is `#[repr(C)]` meaningful on a union?  | 
 | 123 | +    - When (if ever) do we guarantee that all fields have the same address?  | 
 | 124 | +- Fn pointers (`fn()`, `extern "C" fn()`)  | 
 | 125 | +    - When is transmuting from one `fn` type to another allowed?  | 
 | 126 | +    - Can you transmute from a `fn` to `usize` or raw pointer?  | 
 | 127 | +      - In theory this is platform dependent, and C certainly draws a  | 
 | 128 | +        distinction between `void*` and a function pointer, but are  | 
 | 129 | +        there any modern and/or realisic platforms where it is an  | 
 | 130 | +        issue?  | 
 | 131 | +    - Is `Option<extern "C" fn()>` guaranteed to be a pointer (possibly null)?  | 
 | 132 | +- References `&T` and `&mut T`  | 
 | 133 | +    - Out of scope: aliasing rules  | 
 | 134 | +    - Always aligned, non-null  | 
 | 135 | +    - When using the C ABI, these map to the C pointer types, presumably  | 
 | 136 | +- Raw pointers  | 
 | 137 | +    - Effectively same as integers?  | 
 | 138 | +    - Is `ptr::null` etc guaranteed to be equal in representation to `0_usize`?  | 
 | 139 | +      - C does guarantee that `0` when cast to a pointer is NULL  | 
 | 140 | +- Representation knobs:  | 
 | 141 | +    - Custom alignment ([RFC 1358])  | 
 | 142 | +    - Packed ([RFC 1240] talks about some safety issues)  | 
 | 143 | + | 
 | 144 | +[#46156]: https://github.com/rust-lang/rust/pull/46156  | 
 | 145 | +[#46176]: https://github.com/rust-lang/rust/pull/46176  | 
 | 146 | +[RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363  | 
 | 147 | +[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html  | 
 | 148 | +[RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html  | 
 | 149 | +[RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html  | 
 | 150 | +[RFC 1758]: https://rust-lang.github.io/rfcs/1758-repr-transparent.html  | 
0 commit comments