Skip to content

Conversation

@Lunderberg
Copy link
Contributor

Prior to this commit, the NDArray::CreateView method could produce an aliasing view of an existing array with a different shape or datatype, but the view was required to have the same DLTensor::byte_offset as the existing array. This commit updates the NDArray::CreateView method with an additional parameter, specifying the offset of the view relative to the existing array.

Prior to this commit, the `NDArray::CreateView` method could produce
an aliasing view of an existing array with a different shape or
datatype, but the view was required to have the same
`DLTensor::byte_offset` as the existing array.  This commit updates
the `NDArray::CreateView` method with an additional parameter,
specifying the offset of the view relative to the existing array.
@Lunderberg Lunderberg requested a review from masahi April 26, 2024 18:14
@tqchen
Copy link
Member

tqchen commented Apr 27, 2024

one note is that while such view was OK. the additional byte_offset would require special kernel to interact with them(by allowing elem_offset) in the arguments, and may not be the best peforming one because there is a tradeoff in terms of continugous/alignment and zero copy

@Lunderberg
Copy link
Contributor Author

Agreed, there's a tradeoff, and replacing copies with views isn't something to be applied in every case.

  • Kernel Compatibility

    Currently, the default tir::Buffer used by TE (used by topi, and used in most Relax legalizations) is defined with elem_offset of zero.

    • Option 1: Change TE's default to use arbitrary elem_offset for the default tir::Buffer. (Change the default of elem_offset from zero to a fresh TIR variable.) Not a good option, as this would provide a weaker alignment check that existing kernels may assume.

    • Option 2: Change TE's default to use a dynamic elem_offset that preserves alignment for the default tir::Buffer. (Change the default of elem_offset from zero to a fresh TIR variable, and change the default of offset_factor to kAllocAlignment // dtype.bytes().) This is better than option (1), because data + elem_offset*dtype.bytes() is still aligned. However, still not a good option, as TIR kernels that call external functions typically pass buf.data as an argument, which doesn't include the elem_offset.

    • Option 3: Keep TE's default the same, and instead apply a post-processing step. For each tir::Buffer argument with elem_offset of zero, replace it with a new tir::Buffer with elem_offset allowed, and offset_factor to maintain alignment. Within the body of the function, add a definition of the old tir::Buffer argument either as a MatchBufferRegion or a attr::buffer_bind_scope.

      I'm liking Option 3 the most, as it adds flexibility at the calling scope. Unlike option 1, it still provides the same alignment guarantees for kernels. Unlike option 2, it wouldn't require re-writing a very large number of existing kernels.

  • Performance

    • Zero-Copy vs Contiguous Arrays: The CreateView method is only allowed to make contiguous views of contiguous arrays. It is not able to make strided views, so this tradeoff isn't applicable.

    • Zero-Copy vs Alignment: The CreateView method can make views whose data + byte_offset isn't aligned to
      kAllocAlignment. The restriction of aligned data + byte_offset would still be applied at the PrimFunc level, so this wouldn't cause a performance hit.

    • Zero-Copy vs Memory Footprint: The NDArray returned by CreateView shares the same reference-counted allocation as the source array into which it views. If the view outlives the source, then the memory footprint of the view is larger than a copy would have been.

    Not an issue for the CreateView method itself, but should be a consideration for optimization passes that replace copies with views. (e.g. replacing R.take, R.split, or R.strided_slice with a view) Should only be done in cases where the source array outlives the view.

Both to match the type used in `DLTensor::byte_offset`, and to resolve
compilation errors on 32-bit platforms, which fail to compile due to a
missing `Type2Str` specialization.
@tqchen
Copy link
Member

tqchen commented Apr 27, 2024

We don;t need to worry about TE, since we will be able to support buffer binding side.

Just mention that allowing zero elem_offset is actually an important consideration for performance. Likely in most cases we would still like zero elem_offset by default. In frameworks, some key operations would also usualy requries a compact (that triggers an explicit copy), such that the memory access and alignment are right.

In some cases it is indeed useful to introduce views, in some of those cases we might be able to inline views directly into the previous operation, such that the buffer themselfs still comes with zero offset.

@Lunderberg
Copy link
Contributor Author

We don;t need to worry about TE, since we will be able to support buffer binding side.

I brought up TE as a potential place to implement a change, but don't think it is the best option. While buffer-binding is supported in the tvm.build API, it is not supported in relax.BlockBuilder.call_te, nor in the te.create_prim_func used by BlockBuilder.call_te. Because it isn't exposed by these APIs, an implementation at this level would need to change the defaults all the way down to TE, which is a wider-reaching change than I'd like to make.

In some cases it is indeed useful to introduce views, in some of those cases we might be able to inline views directly into the previous operation, such that the buffer themselfs still comes with zero offset.

Agreed, and I like this as the long-term direction.

@Lunderberg
Copy link
Contributor Author

It sounds that we're in agreement that this is useful functionality to have available, and that applying that functionality will come with potential trade-offs.

@Lunderberg Lunderberg merged commit c0385c7 into apache:main Apr 29, 2024
@Lunderberg Lunderberg deleted the runtime_ndarray_create_view_with_offset branch April 29, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants