-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[https://nvbugspro.nvidia.com/bug/5415268] fix illegal smem access with chunked attention #6401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[https://nvbugspro.nvidia.com/bug/5415268] fix illegal smem access with chunked attention #6401
Conversation
WalkthroughThe masked multihead attention CUDA kernel was updated to introduce a new parameter, Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
2644f13
to
97b26ae
Compare
/bot run |
PR_Github #13184 [ run ] triggered by Bot |
97b26ae
to
4aab6c3
Compare
/bot run |
PR_Github #13187 [ run ] triggered by Bot |
PR_Github #13184 [ run ] completed with state |
PR_Github #13187 [ run ] completed with state |
Signed-off-by: Perkz Zheng <[email protected]>
4aab6c3
to
29c8870
Compare
/bot run --disable-fail-fast |
PR_Github #13264 [ run ] triggered by Bot |
PR_Github #13264 [ run ] completed with state |
x86 tests passed, re-triggered SBSA tests here: https://nv/trt-llm-cicd/job/release-0.21/job/L0_Test-SBSA/263 |
/bot skip --comment "The previous failed SBSA test stage passed" |
PR_Github #13463 [ skip ] triggered by Bot |
PR_Github #13463 [ skip ] completed with state |
…th chunked attention (NVIDIA#6401) Signed-off-by: Perkz Zheng <[email protected]> Co-authored-by: Sharan Chetlur <[email protected]>
…th chunked attention (NVIDIA#6401) Signed-off-by: Perkz Zheng <[email protected]> Co-authored-by: Sharan Chetlur <[email protected]>
…th chunked attention (NVIDIA#6401) Signed-off-by: Perkz Zheng <[email protected]> Co-authored-by: Sharan Chetlur <[email protected]>
…th chunked attention (#6401) Signed-off-by: Perkz Zheng <[email protected]> Co-authored-by: Sharan Chetlur <[email protected]>
…th chunked attention (NVIDIA#6401) Signed-off-by: Perkz Zheng <[email protected]> Co-authored-by: Sharan Chetlur <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>
…th chunked attention (NVIDIA#6401) Signed-off-by: Perkz Zheng <[email protected]> Co-authored-by: Sharan Chetlur <[email protected]>
this MR fixes the illegal shared memory access when chunked attention is used in MMHA. The shared memory offset is not calculated properly.
Summary by CodeRabbit