[TOPI] improve inclusive_scan for thrust #16652

yongwww · 2024-02-28T05:37:36Z

The topi.cuda.inclusive_scan currently relies on performing an exclusive_scan followed by an add operation that adds the input data back in. To eliminate the overhead introduced by this extra addition, probably we should have an implementation specifically designed for inclusive_scan. As of now, the implementation of inclusive_scan is limited to exclusive_scan_ir, developing an inclusive_scan_ir will be needed for performance in the future.

In this pull request, we specifically address this efficiency for thrust by directly calling the inclusive function.

Co-authored-by: Wuwei Lin [email protected]

tqchen · 2024-02-28T14:00:40Z

@tvm-bot rerun

yongwww · 2024-02-28T19:03:19Z

@tvm-bot rerun

vinx13 · 2024-02-28T20:43:49Z

python/tvm/topi/cuda/scan.py

-                data, output_dtype, exclusive=True, return_reduction=return_reduction, binop=binop
-            )
+        if _can_use_scan_thrust(binop):
+            return scan_thrust(data, output_dtype, return_reduction=return_reduction, binop=binop)


let's pass exclusive=True explicitly here

yongwww · 2024-02-28T21:45:17Z

With this example:

    @I.ir_module
    class Mod:
        @R.function
        def foo(x: R.Tensor((20, 32000), "float32")):
            with R.dataflow():
                gv = R.cumsum(x, axis=1)
                R.output(gv)
            return gv
    target = tvm.target.Target("cuda -libs=thrust", host="llvm")
    dev = tvm.cuda(0)
    ex = relax.build(Mod, target)
    vm = relax.VirtualMachine(ex, device=dev)

The 'cuda_gpu_kern_sum' stats report

w/ this change

Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name

 92.8       18,099,352      1,101  16,439.0  16,384.0    14,944    18,336        433.8  void cub::CUB_200200_750_NS::DeviceScanByKeyKernel<cub::CUB_200200_750_NS::DeviceScanByKeyPolicy<th…
  7.2        1,409,748      1,101   1,280.4   1,280.0     1,151     1,568        120.1  void cub::CUB_200200_750_NS::DeviceScanByKeyInitKernel<cub::CUB_200200_750_NS::ReduceByKeyScanTileS…

w/o this change

 51.6       22,603,888      1,101  20,530.3  20,545.0    19,744    21,120        293.0  cumsum_kernel
 44.3       19,442,725      1,101  17,659.2  17,665.0    16,641    18,752        294.6  void cub::CUB_200200_750_NS::DeviceScanByKeyKernel<cub::CUB_200200_750_NS::DeviceScanByKeyPolicy<th…
  4.1        1,800,850      1,101   1,635.6   1,632.0     1,440     1,792         35.9  void cub::CUB_200200_750_NS::DeviceScanByKeyInitKernel<cub::CUB_200200_750_NS::ReduceByKeyScanTileS…

The execution perf numbers I got on NVIDIA GeForce RTX 3070:

R.cumsum(x, axis=1)

Description	Performance
With this change (w/ thrust)	0.018 ms
Without this change (w/ thrust)	0.039 ms
Without thrust	0.460 ms

R.cumsum(x, axis=0)

Description	Performance
With this change (w/ thrust)	0.073 ms
Without this change (w/ thrust)	0.074 ms
Without thrust	44.018 ms

MasterJH5574

Thank you @yongwww!!

yongwww · 2024-02-29T00:38:23Z

The failed test tests/python/relax/test_op_unpack.py::test_tensor_dtype_lanes[bfloat] occurred in the last several merged commits, it's supposed to be fixed with #16649, is unrelated to this PR. I can rebase and have a clean run if necessary. The cpu failed with flaky test, it was green in https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-cpu/detail/PR-16652/3/pipeline/

yongwww · 2024-02-29T19:18:53Z

@tvm-bot rerun

Fix comments

yongwww · 2024-03-01T03:05:47Z

@tvm-bot rerun

MasterJH5574 · 2024-03-02T02:34:05Z

@tvm-bot rerun

Fix comments

vinx13 reviewed Feb 28, 2024

View reviewed changes

vinx13 approved these changes Feb 28, 2024

View reviewed changes

MasterJH5574 approved these changes Feb 28, 2024

View reviewed changes

yongwww force-pushed the topi_cumsum_update branch from e1931c6 to db50782 Compare February 29, 2024 04:37

lhutton1 mentioned this pull request Feb 29, 2024

[CI Problem] unity/pr-head does not automatically rebase against main before CI run #16657

Closed

[TOPI] improve inclusive_scan for thrust

aa0b202

Fix comments

yongwww force-pushed the topi_cumsum_update branch from db50782 to aa0b202 Compare February 29, 2024 19:37

tqchen merged commit 3b25588 into apache:main Mar 3, 2024

yongwww deleted the topi_cumsum_update branch March 4, 2024 00:02

Lunderberg pushed a commit to Lunderberg/tvm that referenced this pull request Mar 12, 2024

[TOPI] improve inclusive_scan for thrust (apache#16652)

7435cb7

Fix comments

thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024

[TOPI] improve inclusive_scan for thrust (apache#16652)

8170cb1

Fix comments

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TOPI] improve inclusive_scan for thrust #16652

[TOPI] improve inclusive_scan for thrust #16652

Uh oh!

yongwww commented Feb 28, 2024

Uh oh!

tqchen commented Feb 28, 2024

Uh oh!

yongwww commented Feb 28, 2024

Uh oh!

vinx13 Feb 28, 2024

Uh oh!

yongwww commented Feb 28, 2024 •

edited

Loading

Uh oh!

MasterJH5574 left a comment

Uh oh!

yongwww commented Feb 29, 2024 •

edited

Loading

Uh oh!

yongwww commented Feb 29, 2024

Uh oh!

yongwww commented Mar 1, 2024

Uh oh!

MasterJH5574 commented Mar 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[TOPI] improve inclusive_scan for thrust #16652

[TOPI] improve inclusive_scan for thrust #16652

Uh oh!

Conversation

yongwww commented Feb 28, 2024

Uh oh!

tqchen commented Feb 28, 2024

Uh oh!

yongwww commented Feb 28, 2024

Uh oh!

vinx13 Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

yongwww commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterJH5574 left a comment

Choose a reason for hiding this comment

Uh oh!

yongwww commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yongwww commented Feb 29, 2024

Uh oh!

yongwww commented Mar 1, 2024

Uh oh!

MasterJH5574 commented Mar 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yongwww commented Feb 28, 2024 •

edited

Loading

yongwww commented Feb 29, 2024 •

edited

Loading