Skip to content

Conversation

@firefighterduck
Copy link
Contributor

@firefighterduck firefighterduck commented Aug 8, 2025

Description

To take another step to having full layout information for as many types as possible, this PR introduces a new LayoutComputer (name up for discussion), which can be used to compute memoized layouts (or partial layouts I call LayoutHints) for all kinds of types.

Goal

This is meant to cater to the need of tools like my own (now in the process of being rewritten in Rust and using charon_lib) which want to know layout information about many types, not just those with a type declaration.

Status

The PR contains a lot of assumptions about when we can get which information and many of these should be discussed in greater detail. Right now it tries to supports the following cases:

  • ADTs with a known layout in the type decl
  • ADTs without a known layout in the type decl by trying to instantiate its generic type parameters
  • Tuples (precise if all elements are of the same size, otherwise only a hint)
  • Builtins (Box as a pointer type, Slice and Str as unsized types)
  • Literals
  • References and raw pointers (precise if we know whether they need to be wide, otherwise only a hint)
  • DynTrait as unsized

All other types are given no layout or hint. This is especially unfortunate for opaque ADTs from std for which we could do a lot (e.g. Result<NonZero<u32>,()> doesn't get a layout because NonZero is opaque). We could probably try to hardcode some layout schema for some of these types.

@firefighterduck
Copy link
Contributor Author

One thing that should be discussed:
I've implemented the LayoutComputer in such a way that it leaks freshly computed layouts to allow for memoization, i.e. because every call to get_layout_of could potentially change the internal state of LayoutComputer, we cannot simply store the fresh layouts in the LayoutComputer and return references to them. I've thought about writing a custom Drop implementation that would take the leaked references, turn them back into boxes, and drop them manually, but since the LayoutComputer should live as long as the crate data, and thus probably as long as the tool using it, that might not make a difference. Alternative implementations could potentially be based on some Cell type or maybe ManuallyDrop, but I haven't looked into these options.

@Nadrieril
Copy link
Member

This is encoding a huge amount of logic which in the end doesn't compute the same layout as rustc, which would be observable from e.g. the output of offset_of!. What problem are you trying to solve that isn't solved by the new --monomorphize option?

If what you need are generic layouts, the inevitable discrepancy with e.g. offset_of is a problem for me. We can't be ouputting a generic layout if it doesn't end up matching data that could be observed from running the program.

@firefighterduck
Copy link
Contributor Author

firefighterduck commented Aug 11, 2025

It should compute the same layout as rustc, up to my explicitly states assumptions. And I can't tell yet what specific problem I need this for, since I'm still in the middle of working on it (for reference: https://codeberg.org/firefighterduck/Brush_rs). However, I can say that I need as much and as precise information as possible, since biabduction is known to work best with precise information and to my knowledge symbolic layouts have never used with it before. I have not tried the new --monomorphize option yet, but since I need to be able to work with completely generic types and library code, I absolutely need as much information as I can get, e.g. even just knowing whether a generic type can be a ZST can make a huge difference for my work. If you check the new test output file, you can see that even for some very basic types such as tuples or Result, for which one should be able to get a precise layout in many cases, it is only possible to get this information by recomputing it (see also the type layout computation code in https://github.com/soteria-tools/soteria/blob/main/soteria-rust/lib/layout.ml which have used in part in my proof-of-concept and which was the inspiration for this PR). If the new monomorphization can fix this problem, that's a great start, but I doubt it will suffice. I can just move all of this into my own tool, but would prefer to have as much of it in Charon, since most users should profit from better layout information.

@ssyram
Copy link
Contributor

ssyram commented Aug 12, 2025

So basically this PR wants to compute the generic layout? I.e., the expression of the layout info? E.g., a type T<U> to be with size of "4 + sizeof(U)"?

Btw, just out of curiosity, how will you represent DST layout? Will you compute the "prefix length" corresponding to the fields before the final (potential) DST field?

@firefighterduck
Copy link
Contributor Author

So basically this PR wants to compute the generic layout? I.e., the expression of the layout info?

No, the facility is meant to collect more available precise layout information that Charon currently can't use. E.g. for types such as u32, (u32,u32), or &str Charon currently can't give you a layout even though much about them is obvious to us. For other cases such as Box<T>, it at least computes a layout hint, i.e. the minimal known information such as a lower bound on the size (a box can either be a wide or a thin pointer, thus the size of a pointer).

@ssyram
Copy link
Contributor

ssyram commented Aug 12, 2025

OK, thanks for the explanation. So you mean to fix two things for Charon: 1. Charon currently provides layout for only the TypeDecl but you want to cover all possible types including the Builtins & tuples; 2. For generic, you want to compute some bounds? Well, I guess longer discussion can be in Zulip, though.

@firefighterduck
Copy link
Contributor Author

I just remembered an important feature that's still missing: the layout computer should be able to check whether a type has representation annotations, e.g. #[repr(C)]. Especially the C representation is not only important for my use case but it should also make computing the layout very straightforward. I assume that all types that can have such an attribute must necessarily be either Opaque or have a type declaration with this information.

Comment on lines 55 to 58
/// Compute the layout of a tuple as precise as possible.
///
/// If the elements have different sizes and thus might need padding,
/// it returns a hint, otherwise the precise layout.
Copy link
Member

@Nadrieril Nadrieril Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust gives no guarantees about tuple layout, not even that the fields are in the expected order. Also size isn't the only thing at play, types can have the same size and different alignment (which is likely to cause reorderings). I guess for a tuple with all same size and alignment we can reasonably expect that the fields are in order, but I'd rather ask rustc directly if possible.

Comment on lines +190 to +177
/// Computes the layout based on the simple algorithm for C types.
fn compute_c_layout_from_fields(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C layout doesn't mean "all fields in order in the obvious way". There are weird alignment shenanigans on windows for example, hence this recent RFC: rust-lang/rfcs#3845. We must therefore only provide layout hints for repr(C).

Copy link
Member

@Nadrieril Nadrieril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your layout hints! Happy to integrate that. Otherwise I'd prefer to never construct a Layout ourselves except for pointers and literals (where the layout is guaranteed and trivial), because everything about layouts is subtle and soundness-critical.

let mut max_align = 1;
let mut uninhabited_part = false;

// If it is a new-type struct, we should be able to get a layout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no guarantee that a newtype struct has the same layout as its contents (unless repr(transparent)). In fact it's easy to change that:

#[repr(align(64))]
struct S(u8);

Comment on lines 183 to 171
min_size: total_min_size,
min_alignment: max_align,
possibly_uninhabited: uninhabited_part,
Copy link
Member

@Nadrieril Nadrieril Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For enum variants, if there's an uninhabited field the whole variant may have size zero (rust-lang/unsafe-code-guidelines#443). Hence min_size should be zero if possibly_uninhabited is true and we're in an enum variant. I don't know the rule for alignment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if repr(packed) then the alignment can be wrong so min_alignment should be 1 I think.

@firefighterduck
Copy link
Contributor Author

I like your layout hints! Happy to integrate that. Otherwise I'd prefer to never construct a Layout ourselves except for pointers and literals (where the layout is guaranteed and trivial), because everything about layouts is subtle and soundness-critical.

Let me give an updated overview over how different types are handled (not yet pushed). The following types will get a precise layout:

  • Types with a TypeDecl and a known layout
  • Unit
  • Literal types
  • Thin pointer types (&, &mut, *const, *mut, Box)
  • Unsized types (an unsized layout, size/alignment is None, no fields)
  • Arrays, if the number of elements is known, the size is not 0, the element type's layout is precise
  • Transparent new-type structs, if the inner type has a precise layout

For the following types, only a hint is returned:

  • Wide pointers (since the order of the address and metadata is not known).
  • Tuples
  • repr(C) ADTs
  • enums
  • Other ADTs, potentially based on their instantiated type parameters
  • Zero-sized arrays

How does this sound?

@Nadrieril
Copy link
Member

Nadrieril commented Sep 16, 2025

How does this sound?

Sounds very good!

For Box, beware that it has a second type parameter, it's not just a pointer if A != GlobalAlloc. So I'm afraid we have to treat it like a normal struct in the general case.

@Nadrieril
Copy link
Member

Not having repr(C) is unfortunate, I have to admit. Something I'm wondering: it seems that rustc can figure out that size_of::<&T>() is 8 if T is sized (look at the optimized MIR here); we could get generic layouts to work by asking rustc to evaluate size_of and offset_of constants and catching errors appropriately. That's probably a lot more work that this PR, but it's a possibility to keep in mind I guess.

@firefighterduck
Copy link
Contributor Author

For Box, beware that it has a second type parameter, it's not just a pointer if A != GlobalAlloc. So I'm afraid we have to treat it like a normal struct in the general case.

Good point, I kinda had in mind, that the allocator API was only usable in rustc yet, and therefore didn't matter for user code.

@firefighterduck
Copy link
Contributor Author

it seems that rustc can figure out that size_of::<&T>() is 8 if T is sized

I've added a sizedness check already, so that we can also get this information in charon.

we could get generic layouts to work by asking rustc to evaluate size_of and offset_of constants and catching errors appropriately.

That's a good point. In my PR, I've tried to add facilities to charon to get layout information after the translation to (U)LLBC, thus invoking rustc isnt't really an option. However, we could also think about adding something similar to the translation to also be able to query rustc.

@Nadrieril
Copy link
Member

I kinda had in mind, that the allocator API was only usable in rustc yet, and therefore didn't matter for user code.

Unfortunately it's available on nightly so users can use it. Wait actually charon has a --hide-allocator option; when it's on there'll be an error if anyone tries to use another allocator so in that case it's ok to say that Box has the same layout as NonNull.

Comment on lines +1049 to +1119
fn substitute_frees(self, generics: &GenericArgs) -> Self {
self.substitute_with_self_frees(generics, &TraitRefKind::SelfId)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that reminds me I should make Free variables optional, they were added to ease the transition to De Bruijn indices. Could you document this function to explain what it does?

@firefighterduck
Copy link
Contributor Author

So I've been thinking that maybe I could try to move the layout computation from happening as part of the charon-lib after parsing a (U)LLBC file to happening at the time of translation from hax to allow us to query rustc for more information. This would require some mechanism to translate from the hax/charon types back to rustc types, but that seems doable (although doing that would probably be kinda hacky?). On the other hand, such a translation mechanism could also allow us to have better tests (checking whether the computed layouts/hints are consistent with what rustc computes). If we could compute more layouts at translation time, we could also store them in the (U)LLBC thus making the available to users of both the Rust and the OCaml APIs. However this would also bloat the representations a lot further.

@Nadrieril
Copy link
Member

Nadrieril commented Sep 29, 2025

It would be rather easy to add layout info to hax::FullDef indeed, and then we could ask rustc for things like size and alignment; I'd support that. That was already the plan since we normally let hax do all the queries to rustc.

Is that what you're asking for though? You seem to say you'd want the layout of a given Ty?

@firefighterduck
Copy link
Contributor Author

firefighterduck commented Sep 30, 2025

You seem to say you'd want the layout of a given Ty?

Exactly, since that is what tools actually need (some more, some less). They need to have information about all types that occur in the program, not just the ones available from definitions. And it's not just/primarily the layout, oftentimes some related properties are of great interest, e.g. whether a type is/can be a ZST, a DST, whether it's FFI-safe, whether it's uninhabited, whether two types can be transmuted into each other, etc. Some of these properties can be approximated from the type alone, but it would be more convenient to have all such information in one place such that all downstream tools can easily access it.

@Nadrieril
Copy link
Member

Setting aside what some other tools may want, what do you personally need? In particular, do you need this info for non-monomorphized types too?

My thinking is that we clearly won't precompute that for every type, but we could store some kind of logical representation for each type declaration. If needed we could give a TypeDecl for most builtin types (tuples, references, pointers, integers) to have a place to retreive that information from. The issue is that rustc won't help much to get the logical representation; it can give us fixed info for mono types, occasional info for gneeric types, but the rest we'll have to do ourselves.

"whether two types can be transmuted into each other" is sufficiently complicated that there's been an unfinished project for years to let rustc know about that in a trait (safe transmute). Even with the ability to query rustc that's hard to do right. That particular one I propose we have a separate issue for it if you want it.

@firefighterduck
Copy link
Contributor Author

Here's a list of information bits my tool currently queries from my charon fork (the branch in this PR):

  • Can the given type be a ZST? Is it guaranteed to be a ZST? (the same with "uninhabited")
  • Does a pointee type require metadata? (This should be as precise as possible, including whether relevant type variables have a Sized constraint)
  • Has the type a specific #[repr(_)] (especially whether it's transparent or C)?
  • Layouts and layout hints for a given type (to model them as precisely as possible in logic), especially alignment and size.
  • All the already available information like whether a type is an enum, which fields it has, etc.
  • Not yet implemented but planed: some guaranteed niches (e.g. Option<NonNull<T>>), repr(C) full layout with padding etc.

And yes, I also need all of this information for non-monomorphized types, since my tool is meant to verify the memory safety of libraries and functions in a compositional way (one logical representation for all possible type arguments, that then gets instantiated/refined at the call site).

About transmute: I don't necessarily want to query rustc but just get enough information about the memory layout to prove in separation logic that a specific transmute is memory safe (and upholds the validity invariants of the types involved).

@Nadrieril
Copy link
Member

* Can the given type be a ZST? Is it guaranteed to be a ZST? (the same with "uninhabited")

Do you need inhabitedness/ZST as a property of the final layout or of what a user is allowed to assume? E.g. a struct may be a ZST today but with private fields so the ZST-ness may not be a guarantee and thus unsound to rely on. Same for inhabitedness. Also consider #[non_exhaustive].

If it's just the final layout, I think these two properties are straightforward to compute? We could add a method to Ty.

Note that a similar question applies to the layouts we have today, and to inter-transmutability. If you want to prove soundness you should not be relying on an accidental layout choice. In my head this calls for two different notions of layout: there's the implemented layout (which we have today for mono types), and the specified layout which we'll have to construct by reading the Rust Reference and seeing what's guaranteed or not. The specified layout would make sense for poly types. It would be extremely cool to have something like that.

* Does a pointee type require metadata? (This should be as precise as possible, including whether relevant type variables have a `Sized` constraint)

The PtrMetadata type is getting improved in #797; does that give you enough information?

@firefighterduck
Copy link
Contributor Author

Do you need inhabitedness/ZST as a property of the final layout or of what a user is allowed to assume? E.g. a struct may be a ZST today but with private fields so the ZST-ness may not be a guarantee and thus unsound to rely on. Same for inhabitedness. Also consider #[non_exhaustive].

I need it for the the exact type used at time of verification in the code (be it a specific type, a type variable, a generic type, ...). I don't need to handle specific types with #[non_exhaustive] any differently, since verification results always only apply to the exact version of the code that was verified. Any change such as adding a field invalidates the results anyway.

Note that a similar question applies to the layouts we have today, and to inter-transmutability. If you want to prove soundness you should not be relying on an accidental layout choice.

I expect the logical contract for transmutability to express some constant layout requirements that would not be satisfied by all layout choices unless its really guaranteed to work. So the proof would necessarily fail if just some layout choice makes it possible.

In my head this calls for two different notions of layout: there's the implemented layout (which we have today for mono types), and the specified layout which we'll have to construct by reading the Rust Reference and seeing what's guaranteed or not. The specified layout would make sense for poly types. It would be extremely cool to have something like that.

That sounds quite intriguing. Most of the time I'd probably want to use the specified layout, anyway, and only utilize the implemented layout where it coincides with the specified one (especially for primitive types and repr(C)).

The PtrMetadata type is getting improved in #797; does that give you enough information?

On a first glance, it might do.

@Nadrieril
Copy link
Member

Nadrieril commented Sep 30, 2025

verification results always only apply to the exact version of the code that was verified

Ok, that makes sense to me. I assume however that you want a verification result to not depend (too much) on the version of rustc used, and therefore not depend on layout choices that may change in the future.

If so then the way forward I have in mind looks like:

  • rename Layout to ConcreteLayout;
  • add AbstractLayout that describes what is guaranteed about this type's layout (or maybe GuaranteedLayout?).

AbstractLayout could even have abstract expressions like "size(Self) >= size(T) + size(U) + 42" if that would be useful. Ideally every info we add to it would be justified by a link to the Reference, to make sure we're doing the right thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants