Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ struct OnnxToOvNetworkBindings {
"beam_idx",
"past_key_values",
"present",
"total_seq_len",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kotomi-Du Does the stateful model post translation into OVIR comprise of total_seq_len input always? Is this a general case for all LLMs now (since which OV toolkit version this was added)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the input name from Msft generic model (specifically Phisilica model), not the Epctx OVIR model OV toolkit generated

};

OnnxToOvNetworkBindings(OVExeNetwork& exec_network, SubGraphContext& subgraph_context, SessionContext& session_context) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ std::vector<SupportedOp> supported_op_mode = {
{"Atanh", V_2020_4, {"CPU"}},
{"Atanh", V_2022_1, {"GPU"}},
{"Attention", V_2023_0, {"CPU", "GPU"}},
{"GroupQueryAttention", V_2025_1, {"CPU", "GPU"}},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the JIRA in the PR description that enables GQA Op for CPU & GPU plugins in the ONNX OV frontend. Please make sure this change doesn't conflict with GQA support for NPU in your validation process (FYI we are not targeting to support GQA for NPU currently)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

{"AveragePool", V_2020_4, {"CPU", "GPU"}},
{"BatchNormalization", V_2020_4, {"CPU", "GPU"}},
{"BiasGelu", V_2023_0, {"CPU", "GPU"}},
Expand Down
Loading