Skip to content

Conversation

Kotomi-Du
Copy link

@Kotomi-Du Kotomi-Du commented Oct 10, 2025

GQA is originally supported by OV starting from 2025.1. This PR is to align with OV support.

"beam_idx",
"past_key_values",
"present",
"total_seq_len",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kotomi-Du Does the stateful model post translation into OVIR comprise of total_seq_len input always? Is this a general case for all LLMs now (since which OV toolkit version this was added)?

{"Atanh", V_2020_4, {"CPU"}},
{"Atanh", V_2022_1, {"GPU"}},
{"Attention", V_2023_0, {"CPU", "GPU"}},
{"GroupQueryAttention", V_2025_1, {"CPU", "GPU"}},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the JIRA in the PR description that enables GQA Op for CPU & GPU plugins in the ONNX OV frontend. Please make sure this change doesn't conflict with GQA support for NPU in your validation process (FYI we are not targeting to support GQA for NPU currently)

Copy link

@ankitm3k ankitm3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants