Skip to content

Conversation

@yzhliu
Copy link
Member

@yzhliu yzhliu commented Sep 20, 2019

Add uint8 intrinsic for ARM. Currently it is udot.v2i32.v8i8 which may have too small lanes. will add more later

@anijain2305 @zhiics @vinx13 @ZihengJiang

@yzhliu yzhliu changed the title Armint8 [topi] add ARM v8.2 udot (uint8) support Sep 20, 2019
Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minors, other LGTM

@anijain2305
Copy link
Contributor

Before merging, it would be good if we can try 2 more optimizations

  • Currently, udot seems to be little slow (~1x speedup). Reasoning can be that we are not fully utilizing the fused accumulation. We should look at the assembly to double-check that.
  • Please try udot.v4i32.v16i8, that should quadruple the throughput compared to FP32.

@yzhliu
Copy link
Member Author

yzhliu commented Sep 28, 2019

@anijain2305 @zhiics please review again.

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comments.

@anijain2305
Copy link
Contributor

It will be good if we can share the performance speedup results.

@yzhliu
Copy link
Member Author

yzhliu commented Sep 30, 2019

@tqchen could you check the ci instance? it shows "no space left"

@yzhliu
Copy link
Member Author

yzhliu commented Sep 30, 2019

@anijain2305 The avg speedup is ~2.1x compared to fp32

@tqchen
Copy link
Member

tqchen commented Oct 1, 2019

ci issue fixed

@yzhliu
Copy link
Member Author

yzhliu commented Oct 1, 2019

Thanks @anijain2305 @zhiics @tqchen

@yzhliu yzhliu merged commit 5cc1764 into apache:master Oct 1, 2019
anijain2305 pushed a commit to anijain2305/tvm that referenced this pull request Oct 17, 2019
* [topi] add ARM v8.2 udot (uint8) support

* fix test case

* fix common conv2d schedule

* add back fp32_time in test

* fix lint

* fix doc, add support for int32_lanes=4, signed int

* fix lint

* add ic_bn % 4 checker in schedule
wweic pushed a commit to neo-ai/tvm that referenced this pull request Oct 18, 2019
* [topi] add ARM v8.2 udot (uint8) support

* fix test case

* fix common conv2d schedule

* add back fp32_time in test

* fix lint

* fix doc, add support for int32_lanes=4, signed int

* fix lint

* add ic_bn % 4 checker in schedule
petrex pushed a commit to petrex/tvm that referenced this pull request Oct 29, 2019
* master:
  Fix split's last factor issue (apache#4044)
  [COMMUNITY] ajtulloch -> committer (apache#4043)
  [TOPI]Add op argwhere (apache#3994)
  [topi] add ARM v8.2 udot (uint8) support (apache#3978)
  [COMMUNITY] anijain2305 -> reviewer (apache#4036)
  [QNN] Renaming dense operator. (apache#4033)
  [Relay][Compile_engine] Int64 shape handling for outputs. (apache#4031)
  Add dmlc-core to the list of installed header directories. (apache#4035)
  [ARITH] migrate indexdiv/mod to floordiv/mod (apache#4008)
  [Relay] Move prelude to text format (apache#3939)
  make tvm compilable by gcc 4.9.2 (apache#4032)
  [AUTOTVM][DOCS] Add a link to the defining network description of auto-tuning tutorial (apache#4023)
  [ARITH] cleanup the indexmod/div on python side (apache#4028)
  [Fix] Add more pad_mode support for onnx converter (apache#4029)
  Add parser support for ReLU tflite operator (apache#4022)
  Additional MXNet Convolution and Deconvolution tests (apache#4026)
  docs: minor spelling tweaks (apache#4027)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants