- 
                Notifications
    
You must be signed in to change notification settings  - Fork 347
 
Description
Please post your thoughts, or let me know if something is unclear/confusing!
Summary
This RFC proposes the addition of a new storage type RefRepr and associated type aliases ArrayRef<A, D>, ArrayRef1<A>, ArrayRef2<A>, etc., such that we can implement the following traits:
Borrow<ArrayRef<A, D>>forArrayBase<S, D> where S: DataDeref<Target = ArrayRef<A, D>>forArrayBase<S, D> where S: DataBorrowMut<ArrayRef<A, D>>forArrayBase<S, D> where S: DataMutDerefMut<Target = ArrayRef<A, D>>forArrayBase<S, D> where S: DataMutToOwned<Owned = Array<A, D>>forArrayRef<A, D>
&'a ArrayRef<A, D> would be analogous to ArrayView<'a, A, D>, and &'a mut ArrayRef<A, D> would be analogous to ArrayViewMut<'a, A, D>.
Many of the methods on ArrayBase would be moved to be implemented only on ArrayRef instead, and be available through deref coercion, similar to how much of the functionality of Vec<T> is actually implemented on [T].
Motivation
Good way to express function parameters
Consider a function that needs a view of a 2-D array of f64 elements. With current ndarray, we have the following ways to express this:
// Option 1
fn function<S>(arr: &ArrayBase<S, Ix2>)
where
    S: Data<Elem = f64>,
{}
// Option 2
fn function<'a, V: AsArray<'a, f64, Ix2>>(arr: V) {}
// Option 3
fn function<'a>(arr: impl AsArray<'a, f64, Ix2>) {}
// Option 4
fn function(arr: ArrayView2<'_, f64>) {}Each of these has disadvantages:
- 
Options 1 and 2 are verbose, and become particularly bad for functions taking many array parameters.
 - 
Option 4 is unnatural for callers of the function. (In most cases, they have to call
.view()on the argument, which is verbose and is different from most Rust code, which typically uses references for this use-case.) - 
Options 1, 2, and 3 are generic, so they suffer from monomorphization bloat, as discussed in the next section. For functions taking multiple array parameters, it's even worse—the number of possible type combinations is exponential in the number of array parameters. (To minimize the bloat, it's necessary to create an inner function like option 4, and make the outer function convert the parameter(s) to
ArrayViewand call the inner function.) 
With this RFC, it will be possible to express the function like this:
fn function(arr: &ArrayRef2<f64>) {}This is more concise than the other options, and it's not generic, so there is no monomorphization bloat.
Mutable views would be handled similarly, just with &mut instead of &.
Substantially reduce monomorphization bloat
Most of the functionality of ndarray is currently implemented as methods on ArrayBase<S, D>. As a result, the methods are monomorphized for each new combination of S and D types, which means that there's a lot of nearly-identical generated code, i.e. "monomorphization bloat". The same issue occurs for crates which provide extension traits, such as ndarray-linalg, ndarray-stats, and ndarray-npy. (It's possible to minimize monomorphization bloat for a method by changing it to convert everything to ArrayView/Mut and then call an inner function which takes ArrayView/Mut parameters. However, that's very verbose and tedious.)
If, instead, we move most functionality to be implemented only on ArrayRef<A, D>, analogous to how much of the functionality of Vec<T> is actually implemented on [T], then the monomorphization bloat due to differing storage types would be eliminated. (Callers would rely on deref coercion, like they do for methods actually implemented on [T] for Vec<T>.)
Enable in-place arithmetic operator syntax for the result of a slicing operation
Currently, it's not possible to use in-place arithmetic operator syntax for the results of slicing operations; instead, it's necessary to call the appropriate method:
let mut a = array![0, 1, 2, 3, 4];
// Compiles:
use std::ops::AddAssign;
a.slice_mut(s![1..3]).add_assign(1);
// Doesn't compile:
*a.slice_mut(s![1..3]) += 1;The error message is
error[E0614]: type `ArrayBase<ViewRepr<&mut {integer}>, Dim<[usize; 1]>>` cannot be dereferenced
  --> src/main.rs:11:5
   |
11 |     *a.slice_mut(s![1..3]) += 1;
   |    
By making ArrayBase dereference to ArrayRef and implementing the in-place arithmetic operators on ArrayRef, this will compile without errors.
Better integration into the Rust ecosystem
The existing Rust ecosystem provides functionality dependent on traits like Borrow. Implementing these traits will allow ndarray to interact more cleanly with the rest of the ecosystem. See #878 for one example.
Guide-level explanation
In many ways, an owned array Array<A, D> is similar to Vec<A>. It is an allocated container for a collection of elements.
The new ArrayRef<A, D> type for arrays is analogous to the [A] (i.e. slice) type for vectors. Arrays dereference to ArrayRef<A, D>, like Vec<A> derefs to [A], so functionality available for ArrayRef<A, D> is available for other array types through deref coercion, like how functionality available for [A] is available for Vec<A> through deref coercion.
When writing a function, you should use &ArrayRef<A, D> or &mut ArrayRef<A, D> as the parameter types when possible, similar to how you'd use &[A] or &mut [A] instead of &Vec<A> or &mut Vec<A>.
In general, handle ArrayRef<A, D> analogously to how you'd handle [A]. The primary ways in which they are not analogous are the following:
- 
ArrayRef<A, D>isSized, unlike[A]. As a result,&ArrayRef<A, D>is a thin pointer, while&[A]is a thick pointer. - 
Due to technical limitations, given an array it's not possible to directly create an
&ArrayRefthat is a view of a portion of the array. In other words, you can create a slice (i.e.&[A]) of a portion of a&[A]/Vec<A>with indexing, e.g.&var[3..12], but this is not possible for&ArrayRef. In order to create an&ArrayRefthat references a portion of an array, you have to first create anArrayView(by slicing, etc.), and then coerce thatArrayViewtoArrayRef. 
Also note that &'a ArrayRef<A, D> is similar to ArrayView<'a, A, D>, and &'a mut ArrayRef<A, D> is similar to ArrayViewMut<'a, A, D>, with the following exceptions:
- 
You don't have call a method like
.reborrow()(forArrayView/Mut) to manually shorten the lifetime of the borrow. The lifetimes of references toArrayRefwill be automatically adjusted as necessary, just like other references in Rust. - 
The shape, strides, and pointer of an
&ArrayRef<A, D>or&mut ArrayRef<A, D>cannot be modified. For example, you can't call.slice_axis_inplace(). To get an array with different shape/strides/pointer, you have to create a view. For example, you could call.slice_axis(), or you could create a view with.view()and then call.slice_axis_inplace()on the view. 
Implementation
Implementing Deref and DerefMut with Target = ArrayRef<A, D>
The new RefRepr storage type will be:
#[repr(C)]
pub struct RefRepr<A> {
    elem: PhantomData<A>,
}The ArrayBase struct will be changed to the following:
#[repr(C)]
pub struct ArrayBase<S, D>
where
    S: RawData,
{
    dim: D,
    strides: D,
    ptr: NonNull<S::Elem>,
    data: S,
}The important parts of this change are:
- 
The
dim,strides, andptrfields are the first fields in the struct. - 
The struct is declared
repr(C). 
These changes are necessary so that it's possible to implement Deref and DerefMut by casting references to ArrayBase into references to ArrayRef.
@bluss pointed out below that it's necessary to avoid infinite recursion in deref coercion. To do this, a DataDeref trait will be added:
pub unsafe trait DataDeref: Data {}This trait will be implemented by all storage types except for RefRepr and will be used as a bound for the Deref and DerefMut implementations.
Deref and DerefMut will be implemented like this:
impl<A, S, D> Deref for ArrayBase<S, D>
where
    S: Data<Elem = A> + DataDeref,
{
    type Target = ArrayRef<A, D>;
    fn deref(&self) -> &ArrayRef<A, D> {
        unsafe {
            &*(self as *const Self).cast::<ArrayRef<A, D>>()
        }
    }
}
impl<A, S, D> DerefMut for ArrayBase<S, D>
where
    S: DataMut<Elem = A> + DataDeref,
{
    fn deref_mut(&mut self) -> &mut ArrayRef<A, D> {
        self.ensure_unique();
        unsafe {
            &mut *(self as *mut Self).cast::<ArrayRef<A, D>>()
        }
    }
}Moving most existing methods to be implemented only for ArrayRef<A, D>
Most of the existing methods which don't modify the shape/strides/pointer (such as slice, mapv, etc.) will no longer be implemented for all ArrayBase<S, D>, but rather will be implemented only on ArrayRef<A, D>. This change isn't necessary in order to introduce the ArrayRef type, but it is necessary in order to reduce the existing monomorphization bloat, which is one of the primary benefits of the ArrayRef type.
Data traits for RefRepr
The RefRepr type will implement the following traits:
RawDataRawDataMutData(but make theinto_ownedimplementation panic withunreachable!())DataMutRawDataSubs
In other words, the &'a or &'a mut in a reference to an ArrayRef will control the lifetime and mutability of the data.
We need to audit all of the available methods to make sure that:
- it's possible to create an 
&'a ArrayRef<A, D>only for a base array withS: Datathat lives at least as long as'a, - it's possible to create an 
&'a mut ArrayRef<A, D>only for a base array withS: DataMutthat lives at least as long as'a, and - it's not possible to create an 
ArrayRef<A, D> 
(I believe this is already true if we just add the Deref, DerefMut, Borrow, and BorrowMut implementations, but we should check.)
Preventing modification of shape/strides/pointer via &mut ArrayRef<A, D>
To minimize confusion and behave analogously to &mut [A], it should not be possible to modify the underlying shape/strides/pointer through a &mut ArrayRef<A, D>. For example,
let mut owned: Array2<f32> = Array::<f32>::zeros((4, 5));
let mut borrowed: &mut ArrayRef<A, D> = &mut owned;
// should not compile since it would modify the shape/strides/pointer:
borrowed.slice_axis_inplace(Axis(0), ..);To achieve this:
- 
A new
MetaMuttrait will be added to control whether the shape/strides/pointer can be modified. - 
MetaMutwill be implemented for all of the existing storage types (OwnedRepr,ViewRepr, etc.), but not the newRefReprtype. - 
A
S: MetaMutbound will be added to all methods which modify the shape/strides/pointer in-place (such asslice_collapse,slice_axis_inplace,collapse_axis,swap_axes,invert_axis,merge_axes,insert_axis_inplace, andindex_axis_inplace) . 
Drawbacks
This proposal introduces yet another storage type, when ndarray already has quite a few.
The similarity of &'a ArrayRef<A, D> to ArrayView<'a, A, D> and &'a mut ArrayRef<A, D> to ArrayViewMut<'a, A, D> could be confusing to new users.
The requirement that it must be possible to create ArrayRef references by reinterpreting the bits in existing ArrayBase instances limits the possible field layouts of ArrayBase.
Rationale and alternatives
Allow mutation of the shape/strides/pointer for &mut ArrayRef<A, D>
It would be possible to avoid the new MetaMut trait by allowing modification of the shape/strides/pointer through &mut ArrayRef<A, D> instances. However, this would likely be confusing to users who are familiar with how &mut [T]/Vec<T> behave, and it would increase the potential for bugs.
Keep most method implementations on all ArrayBase<S, D>
It would be possible to keep the existing method implementations for all ArrayBase<S, D> instead of moving them to be implemented only on ArrayRef<A, D>. However, this would not bring the reduced monomorphization bloat which is one of the advantages of the new ArrayRef type.
Make ArrayRef be a separate struct from ArrayBase
It would be possible for ArrayRef to be a separate struct from ArrayBase, instead of an alias of ArrayBase. This would have the following advantages:
Deref,DerefMut,Borrow, andBorrowMutcould be implemented forArrayBasewithoutunsafecode.- The 
MetaMutandDataDereftraits wouldn't be necessary. - There's less potential for unwanted overlapping or recursive trait impls.
 - Keeping the types separate may be less confusing for new users.
 
The disadvantages of a separate struct type are:
- 
The transition to
ArrayRefwould be more disruptive. Until libraries update their functions to take&ArrayRefor&mut ArrayRefparameters instead of&ArrayBaseor&mut ArrayBaseparameters,ArrayRefwould not be compatible with those functions. (Users of those libraries would likely have to call.view()/.view_mut()in some places.) - 
It may be less convenient for generic code which wants to operate on all array types.
For example, if
ArrayRefis not a separate struct, then these impls are sufficient forAddAssign:impl<'b, A, B, S2, D1, D2> AddAssign<&'b ArrayBase<S2, D2>> for ArrayRef<A, D1> where A: AddAssign<&'b B>, B: 'b, S2: Data<Elem = B>, D1: Dimension, D2: Dimension, {} impl<A, B, D> AddAssign<B> for ArrayRef<A, D> where A: AddAssign<B>, B: ScalarOperand, D: Dimension, {}
(Note that, unfortunately, we can't replace
impl<'b, A, B, S2, D1, D2> AddAssign<&'b ArrayBase<S2, D2>> for ArrayRef<A, D1>withimpl<'b, A, B, D1, D2> AddAssign<&'b ArrayRef<B, D2>> for ArrayRef<A, D1>. For some reason, the compiler has trouble figuring out which impl to use whenimpl<A, B, D> AddAssign<B> for ArrayRef<A, D>also exists.)However, if
ArrayRefis a separate struct, then it would be necessary to add another impl:impl<'b, A, B, D1, D2> AddAssign<&'b ArrayRef<B, D2>> for ArrayRef<A, D1> where A: AddAssign<&'b B>, B: 'b, D1: Dimension, D2: Dimension, {}
 
Custom DST (i.e. custom thick pointer types)
Instead of using thin pointers to ArrayRef, we could wait for custom DSTs (e.g. RFC 2594) to be added to Rust, and then create a custom DST similar to ArrayRef. This would have the following advantages:
- 
Since the shape/strides/pointer would be stored within the thick pointer, it would be possible to modify the shape/strides/pointer in the thick pointer in-place, unlike for a reference to an
ArrayRef. - 
Slicing methods could return a thick pointer instead of an
ArrayView/Mutexcept in theIxDyncase. - 
Raw thick pointers (i.e.
*const/*mut) would be more useful. In other words, since the shape/strides would be stored within the thick pointer itself, it should be possible to access that information without dereferencing the pointer. So, except in theIxDyncase, thick pointers could replaceRawArrayView/Mut. In contrast, given a*const ArrayRef<A, D>or*mut ArrayRef<A, D>, you have to dereference the pointer to access the shape/strides, soArrayRefcannot replace most uses ofRawArrayView/Mut. 
However, I'm not aware of any Rust RFC for custom DSTs which would make it possible to create a custom DST with support for the dynamic-dimensional (IxDyn) case. So, there would be this weird dichotomy between the convenience of thick pointers for the const-dimensional case and the complexity of the dynamic-dimensional case. In contrast, the ArrayRef type proposed in this RFC would support the const-dimensional and dynamic-dimensional cases in a uniform way.
Future possibilities
One thing I wanted to make sure when writing this proposal was that it would still allow replacing the ArrayBase struct with a trait-based API in the future. In other words, if we had
struct Array<A, D> { ... }
struct ArrayView<'a, A, D> { ... }
// ...
trait NdArray { ... }
trait NdArrayMut { ... }
// ...instead of having all arrays be instances of ArrayBase, it would be nice for this proposal to still apply. This proposal would, in fact, be compatible with a trait-based API, and the implementation could be somewhat cleaner because it wouldn't require any unsafe code. I'd suggest implementing the types like this:
struct Array<A, D> {
    meta: ArrayRef<A, D>,
    data: Vec<A>,
}
struct ArrayView<'a, A, D> {
    meta: ArrayRef<A, D>,
    life: PhantomData<&'a A>,
}
struct ArrayRef<A, D> {
    shape: D,
    strides: D,
    ptr: NonNull<A>,
}
impl<A, D> Deref for Array<A, D> {
    type Target = ArrayRef<A, D>;
    fn deref(&self) -> &ArrayRef<A, D> {
        &self.meta
    }
}
// ...