[Runtime] Allow offset to be specified in NDArray::CreateView #16938

Lunderberg · 2024-04-26T18:14:28Z

Prior to this commit, the NDArray::CreateView method could produce an aliasing view of an existing array with a different shape or datatype, but the view was required to have the same DLTensor::byte_offset as the existing array. This commit updates the NDArray::CreateView method with an additional parameter, specifying the offset of the view relative to the existing array.

Prior to this commit, the `NDArray::CreateView` method could produce an aliasing view of an existing array with a different shape or datatype, but the view was required to have the same `DLTensor::byte_offset` as the existing array. This commit updates the `NDArray::CreateView` method with an additional parameter, specifying the offset of the view relative to the existing array.

tqchen · 2024-04-27T00:48:54Z

one note is that while such view was OK. the additional byte_offset would require special kernel to interact with them(by allowing elem_offset) in the arguments, and may not be the best peforming one because there is a tradeoff in terms of continugous/alignment and zero copy

Lunderberg · 2024-04-27T13:07:25Z

Agreed, there's a tradeoff, and replacing copies with views isn't something to be applied in every case.

Kernel Compatibility

Currently, the default tir::Buffer used by TE (used by topi, and used in most Relax legalizations) is defined with elem_offset of zero.
- Option 1: Change TE's default to use arbitrary elem_offset for the default tir::Buffer. (Change the default of elem_offset from zero to a fresh TIR variable.) Not a good option, as this would provide a weaker alignment check that existing kernels may assume.
- Option 2: Change TE's default to use a dynamic elem_offset that preserves alignment for the default tir::Buffer. (Change the default of elem_offset from zero to a fresh TIR variable, and change the default of offset_factor to kAllocAlignment // dtype.bytes().) This is better than option (1), because data + elem_offset*dtype.bytes() is still aligned. However, still not a good option, as TIR kernels that call external functions typically pass buf.data as an argument, which doesn't include the elem_offset.
- Option 3: Keep TE's default the same, and instead apply a post-processing step. For each tir::Buffer argument with elem_offset of zero, replace it with a new tir::Buffer with elem_offset allowed, and offset_factor to maintain alignment. Within the body of the function, add a definition of the old tir::Buffer argument either as a MatchBufferRegion or a attr::buffer_bind_scope.
  
  I'm liking Option 3 the most, as it adds flexibility at the calling scope. Unlike option 1, it still provides the same alignment guarantees for kernels. Unlike option 2, it wouldn't require re-writing a very large number of existing kernels.
Performance
- Zero-Copy vs Contiguous Arrays: The CreateView method is only allowed to make contiguous views of contiguous arrays. It is not able to make strided views, so this tradeoff isn't applicable.
- Zero-Copy vs Alignment: The CreateView method can make views whose data + byte_offset isn't aligned to
  kAllocAlignment. The restriction of aligned data + byte_offset would still be applied at the PrimFunc level, so this wouldn't cause a performance hit.
- Zero-Copy vs Memory Footprint: The NDArray returned by CreateView shares the same reference-counted allocation as the source array into which it views. If the view outlives the source, then the memory footprint of the view is larger than a copy would have been.
Not an issue for the CreateView method itself, but should be a consideration for optimization passes that replace copies with views. (e.g. replacing R.take, R.split, or R.strided_slice with a view) Should only be done in cases where the source array outlives the view.

Both to match the type used in `DLTensor::byte_offset`, and to resolve compilation errors on 32-bit platforms, which fail to compile due to a missing `Type2Str` specialization.

tqchen · 2024-04-27T19:23:06Z

We don;t need to worry about TE, since we will be able to support buffer binding side.

Just mention that allowing zero elem_offset is actually an important consideration for performance. Likely in most cases we would still like zero elem_offset by default. In frameworks, some key operations would also usualy requries a compact (that triggers an explicit copy), such that the memory access and alignment are right.

In some cases it is indeed useful to introduce views, in some of those cases we might be able to inline views directly into the previous operation, such that the buffer themselfs still comes with zero offset.

Lunderberg · 2024-04-27T20:09:53Z

We don;t need to worry about TE, since we will be able to support buffer binding side.

I brought up TE as a potential place to implement a change, but don't think it is the best option. While buffer-binding is supported in the tvm.build API, it is not supported in relax.BlockBuilder.call_te, nor in the te.create_prim_func used by BlockBuilder.call_te. Because it isn't exposed by these APIs, an implementation at this level would need to change the defaults all the way down to TE, which is a wider-reaching change than I'd like to make.

In some cases it is indeed useful to introduce views, in some of those cases we might be able to inline views directly into the previous operation, such that the buffer themselfs still comes with zero offset.

Agreed, and I like this as the long-term direction.

Lunderberg · 2024-04-27T20:21:13Z

It sounds that we're in agreement that this is useful functionality to have available, and that applying that functionality will come with potential trade-offs.

Lunderberg requested a review from masahi April 26, 2024 18:14

masahi approved these changes Apr 26, 2024

View reviewed changes

Change type of relative_byte_offset from size_t to uint64_t

6ead805

Both to match the type used in `DLTensor::byte_offset`, and to resolve compilation errors on 32-bit platforms, which fail to compile due to a missing `Type2Str` specialization.

Lunderberg merged commit c0385c7 into apache:main Apr 29, 2024

Lunderberg deleted the runtime_ndarray_create_view_with_offset branch April 29, 2024 14:57

CharlieFRuan mentioned this pull request May 26, 2024

[Web] Add dtype and offset for CreateView in runtime #17028

Merged

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

kurisu6912 mentioned this pull request Sep 5, 2025

kurisu add assume attr patch 1 tile-ai/tvm#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Runtime] Allow offset to be specified in NDArray::CreateView #16938

[Runtime] Allow offset to be specified in NDArray::CreateView #16938

Uh oh!

Lunderberg commented Apr 26, 2024

Uh oh!

tqchen commented Apr 27, 2024

Uh oh!

Lunderberg commented Apr 27, 2024

Uh oh!

tqchen commented Apr 27, 2024

Uh oh!

Lunderberg commented Apr 27, 2024

Uh oh!

Lunderberg commented Apr 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Runtime] Allow offset to be specified in NDArray::CreateView #16938

[Runtime] Allow offset to be specified in NDArray::CreateView #16938

Uh oh!

Conversation

Lunderberg commented Apr 26, 2024

Uh oh!

tqchen commented Apr 27, 2024

Uh oh!

Lunderberg commented Apr 27, 2024

Uh oh!

tqchen commented Apr 27, 2024

Uh oh!

Lunderberg commented Apr 27, 2024

Uh oh!

Lunderberg commented Apr 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants