Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR introduces the MLA attention kernels written in TIR. It also implements the KV cache MLA computation logic.

A new unit test file is added to ensure the correctness of the TIR kernels.

This PR also fixes a few TIR prefill kernel tile size initialization.

@MasterJH5574
Copy link
Contributor Author

@tvm-bot rerun

This PR introduces the MLA attention kernels written in TIR.
It also implements the KV cache MLA computation logic.

A new unit test file is added to ensure the correctness of the
TIR kernels.

This PR also fixes a few TIR prefill kernel tile size initialization.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2025-02-01-tir-mla branch from 0ab745e to dfa4d07 Compare February 5, 2025 14:12
@jinhongyii jinhongyii merged commit 3eb5ad6 into apache:main Feb 5, 2025
19 checks passed
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
This PR introduces the MLA attention kernels written in TIR.
It also implements the KV cache MLA computation logic.

A new unit test file is added to ensure the correctness of the
TIR kernels.

This PR also fixes a few TIR prefill kernel tile size initialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants