Skip to content

Conversation

@wyc-ruiker
Copy link
Contributor

Added support for int4x32 int4x16 int4x4 in BroadcastNode.

In the int4x4 testcase, the IR is:

primfn(compute_1: handle) -> ()
  attr = {"global_symbol": "main", "tir.noalias": True}
  buffers = {compute: Buffer(compute_2: Pointer(int4), int4, [64, 4], [])}
  buffer_map = {compute_1: compute} {
  attr [IterVar(blockIdx.x: int32, (nullptr), "ThreadIndex", "blockIdx.x")] "thread_extent" = 64;
  compute_2[ramp((blockIdx.x*4), 1, 4)] = broadcast(1i4, 4)
}

Before the fix in codegen_c.cc, the codegen cuda is:

extern "C" __global__ void make_int4x4_kernel0(int* __restrict__ compute) {
  ((int16_t*)(compute + ((((int)blockIdx.x) * 4)) / 8))[0] = (int16_t)4369;
}

For int16_t, this index (((int)blockIdx.x) * 4)) / 8 is a bug.
After the fix in codegen_c.cc, the codegen cuda is:

extern "C" __global__ void make_int4x4_kernel0(int* __restrict__ compute) {
  ((int16_t*)(compute) + ((((int)blockIdx.x) * 4)) / 4)[0] = (int16_t)4369;
}

Could you please help review this fix? @vinx13 @Hzfengsy

@tqchen
Copy link
Member

tqchen commented May 26, 2021

@vinx13 please help to manage this PR

@vinx13 vinx13 merged commit f4dce24 into apache:main May 26, 2021
@wyc-ruiker wyc-ruiker deleted the fix-int4 branch May 27, 2021 02:27
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 17, 2021
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants