RFC: Array Reference Type

Please post your thoughts, or let me know if something is unclear/confusing!

# Summary

This RFC proposes the addition of a new storage type `RefRepr` and associated type aliases `ArrayRef<A, D>`, `ArrayRef1<A>`, `ArrayRef2<A>`, etc., such that we can implement the following traits:

- `Borrow<ArrayRef<A, D>>` for `ArrayBase<S, D> where S: Data`
- `Deref<Target = ArrayRef<A, D>>` for `ArrayBase<S, D> where S: Data`
- `BorrowMut<ArrayRef<A, D>>` for `ArrayBase<S, D> where S: DataMut`
- `DerefMut<Target = ArrayRef<A, D>>` for `ArrayBase<S, D> where S: DataMut`
- `ToOwned<Owned = Array<A, D>>` for `ArrayRef<A, D>`

`&'a ArrayRef<A, D>` would be analogous to `ArrayView<'a, A, D>`, and `&'a mut ArrayRef<A, D>` would be analogous to `ArrayViewMut<'a, A, D>`.

Many of the methods on `ArrayBase` would be moved to be implemented only on `ArrayRef` instead, and be available through deref coercion, similar to how much of the functionality of `Vec<T>` is actually implemented on `[T]`.

# Motivation

## Good way to express function parameters

Consider a function that needs a view of a 2-D array of `f64` elements. With current `ndarray`, we have the following ways to express this:

```rust
// Option 1
fn function<S>(arr: &ArrayBase<S, Ix2>)
where
    S: Data<Elem = f64>,
{}

// Option 2
fn function<'a, V: AsArray<'a, f64, Ix2>>(arr: V) {}

// Option 3
fn function<'a>(arr: impl AsArray<'a, f64, Ix2>) {}

// Option 4
fn function(arr: ArrayView2<'_, f64>) {}
```

Each of these has disadvantages:

- Options 1 and 2 are verbose, and become particularly bad for functions taking many array parameters.

- Option 4 is unnatural for callers of the function. (In most cases, they have to call `.view()` on the argument, which is verbose and is different from most Rust code, which typically uses references for this use-case.)

- Options 1, 2, and 3 are generic, so they suffer from monomorphization bloat, as discussed in the next section. For functions taking multiple array parameters, it's even worse—the number of possible type combinations is exponential in the number of array parameters. (To minimize the bloat, it's necessary to create an inner function like option 4, and make the outer function convert the parameter(s) to `ArrayView` and call the inner function.)

With this RFC, it will be possible to express the function like this:

```rust
fn function(arr: &ArrayRef2<f64>) {}
```

This is more concise than the other options, and it's not generic, so there is no monomorphization bloat.

Mutable views would be handled similarly, just with `&mut` instead of `&`.

## Substantially reduce monomorphization bloat

Most of the functionality of `ndarray` is currently implemented as methods on `ArrayBase<S, D>`. As a result, the methods are monomorphized for each new combination of `S` and `D` types, which means that there's a lot of nearly-identical generated code, i.e. "monomorphization bloat". The same issue occurs for crates which provide extension traits, such as `ndarray-linalg`, `ndarray-stats`, and `ndarray-npy`. (It's possible to minimize monomorphization bloat for a method by changing it to convert everything to `ArrayView/Mut` and then call an inner function which takes `ArrayView/Mut` parameters. However, that's very verbose and tedious.)

If, instead, we move most functionality to be implemented only on `ArrayRef<A, D>`, analogous to how much of the functionality of `Vec<T>` is actually implemented on `[T]`, then the monomorphization bloat due to differing storage types would be eliminated. (Callers would rely on deref coercion, like they do for methods actually implemented on `[T]` for `Vec<T>`.)

## Enable in-place arithmetic operator syntax for the result of a slicing operation

Currently, it's not possible to use in-place arithmetic operator syntax for the results of slicing operations; instead, it's necessary to call the appropriate method:

```rust
let mut a = array![0, 1, 2, 3, 4];

// Compiles:
use std::ops::AddAssign;
a.slice_mut(s![1..3]).add_assign(1);

// Doesn't compile:
*a.slice_mut(s![1..3]) += 1;
```

The error message is

```
error[E0614]: type `ArrayBase<ViewRepr<&mut {integer}>, Dim<[usize; 1]>>` cannot be dereferenced
  --> src/main.rs:11:5
   |
11 |     *a.slice_mut(s![1..3]) += 1;
   |    
```

By making `ArrayBase` dereference to `ArrayRef` and implementing the in-place arithmetic operators on `ArrayRef`, this will compile without errors.

## Better integration into the Rust ecosystem

The existing Rust ecosystem provides functionality dependent on traits like `Borrow`. Implementing these traits will allow `ndarray` to interact more cleanly with the rest of the ecosystem. See #878 for one example.

# Guide-level explanation

In many ways, an owned array `Array<A, D>` is similar to `Vec<A>`. It is an allocated container for a collection of elements.

The new `ArrayRef<A, D>` type for arrays is analogous to the `[A]` (i.e. slice) type for vectors. Arrays dereference to `ArrayRef<A, D>`, like `Vec<A>` derefs to `[A]`, so functionality available for `ArrayRef<A, D>` is available for other array types through deref coercion, like how functionality available for `[A]` is available for `Vec<A>` through deref coercion.

When writing a function, you should use `&ArrayRef<A, D>` or `&mut ArrayRef<A, D>` as the parameter types when possible, similar to how you'd use `&[A]` or `&mut [A]` instead of `&Vec<A>` or `&mut Vec<A>`.

In general, handle `ArrayRef<A, D>` analogously to how you'd handle `[A]`. The primary ways in which they are not analogous are the following:

- `ArrayRef<A, D>` is `Sized`, unlike `[A]`. As a result, `&ArrayRef<A, D>` is a thin pointer, while `&[A]` is a thick pointer.

- Due to technical limitations, given an array it's not possible to *directly* create an `&ArrayRef` that is a view of a portion of the array. In other words, you can create a slice (i.e. `&[A]`) of a portion of a `&[A]`/`Vec<A>` with indexing, e.g. `&var[3..12]`, but this is not possible for `&ArrayRef`. In order to create an `&ArrayRef` that references a portion of an array, you have to first create an `ArrayView` (by slicing, etc.), and then coerce that `ArrayView` to `ArrayRef`.

Also note that `&'a ArrayRef<A, D>` is similar to `ArrayView<'a, A, D>`, and `&'a mut ArrayRef<A, D>` is similar to `ArrayViewMut<'a, A, D>`, with the following exceptions:

- You don't have call a method like `.reborrow()` (for `ArrayView/Mut`) to manually shorten the lifetime of the borrow. The lifetimes of references to `ArrayRef` will be automatically adjusted as necessary, just like other references in Rust.

- The shape, strides, and pointer of an `&ArrayRef<A, D>` or `&mut ArrayRef<A, D>` cannot be modified. For example, you can't call `.slice_axis_inplace()`. To get an array with different shape/strides/pointer, you have to create a view. For example, you could call `.slice_axis()`, or you could create a view with `.view()` and then call `.slice_axis_inplace()` on the view.

# Implementation

## Implementing `Deref` and `DerefMut` with `Target = ArrayRef<A, D>`

The new `RefRepr` storage type will be:

```rust
#[repr(C)]
pub struct RefRepr<A> {
    elem: PhantomData<A>,
}
```

The `ArrayBase` struct will be changed to the following:

```rust
#[repr(C)]
pub struct ArrayBase<S, D>
where
    S: RawData,
{
    dim: D,
    strides: D,
    ptr: NonNull<S::Elem>,
    data: S,
}
```

The important parts of this change are:

- The `dim`, `strides`, and `ptr` fields are the first fields in the struct.

- The struct is declared `repr(C)`.

These changes are necessary so that it's possible to implement `Deref` and `DerefMut` by casting references to `ArrayBase` into references to `ArrayRef`.

@bluss pointed out below that it's necessary to avoid infinite recursion in deref coercion. To do this, a `DataDeref` trait will be added:

```rust
pub unsafe trait DataDeref: Data {}
```

This trait will be implemented by all storage types except for `RefRepr` and will be used as a bound for the `Deref` and `DerefMut` implementations.

`Deref` and `DerefMut` will be implemented like this:

```rust
impl<A, S, D> Deref for ArrayBase<S, D>
where
    S: Data<Elem = A> + DataDeref,
{
    type Target = ArrayRef<A, D>;

    fn deref(&self) -> &ArrayRef<A, D> {
        unsafe {
            &*(self as *const Self).cast::<ArrayRef<A, D>>()
        }
    }
}

impl<A, S, D> DerefMut for ArrayBase<S, D>
where
    S: DataMut<Elem = A> + DataDeref,
{
    fn deref_mut(&mut self) -> &mut ArrayRef<A, D> {
        self.ensure_unique();
        unsafe {
            &mut *(self as *mut Self).cast::<ArrayRef<A, D>>()
        }
    }
}
```

## Moving most existing methods to be implemented only for `ArrayRef<A, D>`

Most of the existing methods which don't modify the shape/strides/pointer (such as `slice`, `mapv`, etc.) will no longer be implemented for all `ArrayBase<S, D>`, but rather will be implemented only on `ArrayRef<A, D>`. This change isn't necessary in order to introduce the `ArrayRef` type, but it is necessary in order to reduce the existing monomorphization bloat, which is one of the primary benefits of the `ArrayRef` type.

## Data traits for `RefRepr`

The `RefRepr` type will implement the following traits:

- `RawData`
- `RawDataMut`
- `Data` (but make the `into_owned` implementation panic with `unreachable!()`)
- `DataMut`
- `RawDataSubs`

In other words, the `&'a` or `&'a mut` in a reference to an `ArrayRef` will control the lifetime and mutability of the data.

We need to audit all of the available methods to make sure that:

- it's possible to create an `&'a ArrayRef<A, D>` only for a base array with `S: Data` that lives at least as long as `'a`,
- it's possible to create an `&'a mut ArrayRef<A, D>` only for a base array with `S: DataMut` that lives at least as long as `'a`, and
- it's not possible to create an `ArrayRef<A, D>`

(I believe this is already true if we just add the `Deref`, `DerefMut`, `Borrow`, and `BorrowMut` implementations, but we should check.)

## Preventing modification of shape/strides/pointer via `&mut ArrayRef<A, D>`

To minimize confusion and behave analogously to `&mut [A]`, it should not be possible to modify the underlying shape/strides/pointer through a `&mut ArrayRef<A, D>`. For example,

```rust
let mut owned: Array2<f32> = Array::<f32>::zeros((4, 5));
let mut borrowed: &mut ArrayRef<A, D> = &mut owned;
// should not compile since it would modify the shape/strides/pointer:
borrowed.slice_axis_inplace(Axis(0), ..);
```

To achieve this:

- A new `MetaMut` trait will be added to control whether the shape/strides/pointer can be modified.

- `MetaMut` will be implemented for all of the existing storage types (`OwnedRepr`, `ViewRepr`, etc.), but not the new `RefRepr` type.

- A `S: MetaMut` bound will be added to all methods which modify the shape/strides/pointer in-place (such as `slice_collapse`, `slice_axis_inplace`, `collapse_axis`, `swap_axes`, `invert_axis`, `merge_axes`, `insert_axis_inplace`, and `index_axis_inplace`) .

# Drawbacks

This proposal introduces yet another storage type, when `ndarray` already has quite a few.

The similarity of `&'a ArrayRef<A, D>` to `ArrayView<'a, A, D>` and `&'a mut ArrayRef<A, D>` to `ArrayViewMut<'a, A, D>` could be confusing to new users.

The requirement that it must be possible to create `ArrayRef` references by reinterpreting the bits in existing `ArrayBase` instances limits the possible field layouts of `ArrayBase`.

# Rationale and alternatives

## Allow mutation of the shape/strides/pointer for `&mut ArrayRef<A, D>`

It would be possible to avoid the new `MetaMut` trait by allowing modification of the shape/strides/pointer through `&mut ArrayRef<A, D>` instances. However, this would likely be confusing to users who are familiar with how `&mut [T]`/`Vec<T>` behave, and it would increase the potential for bugs.

## Keep most method implementations on all `ArrayBase<S, D>`

It would be possible to keep the existing method implementations for all `ArrayBase<S, D>` instead of moving them to be implemented only on `ArrayRef<A, D>`. However, this would not bring the reduced monomorphization bloat which is one of the advantages of the new `ArrayRef` type.

## Make `ArrayRef` be a separate struct from `ArrayBase`

It would be possible for `ArrayRef` to be a separate struct from `ArrayBase`, instead of an alias of `ArrayBase`. This would have the following advantages:

- `Deref`, `DerefMut`, `Borrow`, and `BorrowMut` could be implemented for `ArrayBase` without `unsafe` code.
- The `MetaMut` and `DataDeref` traits wouldn't be necessary.
- There's less potential for unwanted overlapping or recursive trait impls.
- Keeping the types separate may be less confusing for new users.

The disadvantages of a separate struct type are:

- The transition to `ArrayRef` would be more disruptive. Until libraries update their functions to take `&ArrayRef` or `&mut ArrayRef` parameters instead of `&ArrayBase` or `&mut ArrayBase` parameters, `ArrayRef` would not be compatible with those functions. (Users of those libraries would likely have to call `.view()`/`.view_mut()` in some places.)

- It may be less convenient for generic code which wants to operate on all array types.
  
  For example, if `ArrayRef` is *not* a separate struct, then these impls are sufficient for `AddAssign`:

  ```rust
  impl<'b, A, B, S2, D1, D2> AddAssign<&'b ArrayBase<S2, D2>> for ArrayRef<A, D1>
  where
      A: AddAssign<&'b B>,
      B: 'b,
      S2: Data<Elem = B>,
      D1: Dimension,
      D2: Dimension,
  {}

  impl<A, B, D> AddAssign<B> for ArrayRef<A, D>
  where
      A: AddAssign<B>,
      B: ScalarOperand,
      D: Dimension,
  {}
  ```

  (Note that, unfortunately, we can't replace `impl<'b, A, B, S2, D1, D2> AddAssign<&'b ArrayBase<S2, D2>> for ArrayRef<A, D1>` with `impl<'b, A, B, D1, D2> AddAssign<&'b ArrayRef<B, D2>> for ArrayRef<A, D1>`. For some reason, the compiler has trouble figuring out which impl to use when `impl<A, B, D> AddAssign<B> for ArrayRef<A, D>` also exists.)

  However, if `ArrayRef` is a separate struct, then it would be necessary to add another impl:

  ```rust
  impl<'b, A, B, D1, D2> AddAssign<&'b ArrayRef<B, D2>> for ArrayRef<A, D1>
  where
      A: AddAssign<&'b B>,
      B: 'b,
      D1: Dimension,
      D2: Dimension,
  {}
  ```

## Custom DST (i.e. custom thick pointer types)

Instead of using thin pointers to `ArrayRef`, we could wait for custom DSTs (e.g. [RFC 2594](https://github.com/rust-lang/rfcs/pull/2594)) to be added to Rust, and then create a custom DST similar to `ArrayRef`. This would have the following advantages:

- Since the shape/strides/pointer would be stored within the thick pointer, it would be possible to modify the shape/strides/pointer in the thick pointer in-place, unlike for a reference to an `ArrayRef`.

- Slicing methods could return a thick pointer instead of an `ArrayView/Mut` except in the `IxDyn` case.

- *Raw* thick pointers (i.e. `*const`/`*mut`) would be more useful. In other words, since the shape/strides would be stored within the thick pointer itself, it should be possible to access that information without dereferencing the pointer. So, except in the `IxDyn` case, thick pointers could replace `RawArrayView/Mut`. In contrast, given a `*const ArrayRef<A, D>` or `*mut ArrayRef<A, D>`, you have to dereference the pointer to access the shape/strides, so `ArrayRef` cannot replace most uses of `RawArrayView/Mut`.

However, I'm not aware of any Rust RFC for custom DSTs which would make it possible to create a custom DST with support for the dynamic-dimensional (`IxDyn`) case. So, there would be this weird dichotomy between the convenience of thick pointers for the const-dimensional case and the complexity of the dynamic-dimensional case. In contrast, the `ArrayRef` type proposed in this RFC would support the const-dimensional and dynamic-dimensional cases in a uniform way.

# Future possibilities

One thing I wanted to make sure when writing this proposal was that it would still allow replacing the `ArrayBase` struct with a trait-based API in the future. In other words, if we had

```rust
struct Array<A, D> { ... }

struct ArrayView<'a, A, D> { ... }

// ...

trait NdArray { ... }

trait NdArrayMut { ... }

// ...
```

instead of having all arrays be instances of `ArrayBase`, it would be nice for this proposal to still apply. This proposal would, in fact, be compatible with a trait-based API, and the implementation could be somewhat cleaner because it wouldn't require any `unsafe` code. I'd suggest implementing the types like this:

```rust
struct Array<A, D> {
    meta: ArrayRef<A, D>,
    data: Vec<A>,
}

struct ArrayView<'a, A, D> {
    meta: ArrayRef<A, D>,
    life: PhantomData<&'a A>,
}

struct ArrayRef<A, D> {
    shape: D,
    strides: D,
    ptr: NonNull<A>,
}

impl<A, D> Deref for Array<A, D> {
    type Target = ArrayRef<A, D>;

    fn deref(&self) -> &ArrayRef<A, D> {
        &self.meta
    }
}

// ...
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Array Reference Type #879

Summary

Motivation

Good way to express function parameters

Substantially reduce monomorphization bloat

Enable in-place arithmetic operator syntax for the result of a slicing operation

Better integration into the Rust ecosystem

Guide-level explanation

Implementation

Implementing `Deref` and `DerefMut` with `Target = ArrayRef<A, D>`

Moving most existing methods to be implemented only for `ArrayRef<A, D>`

Data traits for `RefRepr`

Preventing modification of shape/strides/pointer via `&mut ArrayRef<A, D>`

Drawbacks

Rationale and alternatives

Allow mutation of the shape/strides/pointer for `&mut ArrayRef<A, D>`

Keep most method implementations on all `ArrayBase<S, D>`

Make `ArrayRef` be a separate struct from `ArrayBase`

Custom DST (i.e. custom thick pointer types)

Future possibilities

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Array Reference Type #879

Description

Summary

Motivation

Good way to express function parameters

Substantially reduce monomorphization bloat

Enable in-place arithmetic operator syntax for the result of a slicing operation

Better integration into the Rust ecosystem

Guide-level explanation

Implementation

Implementing Deref and DerefMut with Target = ArrayRef<A, D>

Moving most existing methods to be implemented only for ArrayRef<A, D>

Data traits for RefRepr

Preventing modification of shape/strides/pointer via &mut ArrayRef<A, D>

Drawbacks

Rationale and alternatives

Allow mutation of the shape/strides/pointer for &mut ArrayRef<A, D>

Keep most method implementations on all ArrayBase<S, D>

Make ArrayRef be a separate struct from ArrayBase

Custom DST (i.e. custom thick pointer types)

Future possibilities

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implementing `Deref` and `DerefMut` with `Target = ArrayRef<A, D>`

Moving most existing methods to be implemented only for `ArrayRef<A, D>`

Data traits for `RefRepr`

Preventing modification of shape/strides/pointer via `&mut ArrayRef<A, D>`

Allow mutation of the shape/strides/pointer for `&mut ArrayRef<A, D>`

Keep most method implementations on all `ArrayBase<S, D>`

Make `ArrayRef` be a separate struct from `ArrayBase`