Skip to content

Conversation

@slokesha
Copy link

@slokesha slokesha commented Oct 15, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results

Purpose

Test Plan

Test Result

@slokesha slokesha force-pushed the slokesha/rope_debug branch from e09e6b6 to bd3e8cf Compare October 22, 2025 20:10
Copy link

@libinta libinta Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try hpu version def apply_rotary_pos_emb_hpu(
q: torch.Tensor,
k: torch.Tensor,
cos: torch.Tensor,
sin: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor]:
# Determine rotary dimension from cos/sin shape
ro_dim = cos.shape[-1] * 2

# Split into rotated and pass-through parts  
q_rot = q[..., :ro_dim]  
q_pass = q[..., ro_dim:]  
k_rot = k[..., :ro_dim]  
k_pass = k[..., ro_dim:]  
  
# Prepare cos/sin (remove the chunking)  
cos_full = cos  
sin_full = sin  
from habana_frameworks.torch.hpex.kernels import apply_rotary_pos_emb
q_rot = apply_rotary_emb_torch(q_rot.float(), cos_full.float(), sin_full.float()).type_as(q)  
 k_rot = apply_rotary_emb_torch(k_rot.float(), cos_full.float(), sin_full.float()).type_as(k)  
  
 # Concatenate rotated and pass-through parts  
 q_embed = torch.cat([q_rot, q_pass], dim=-1)  
 k_embed = torch.cat([k_rot, k_pass], dim=-1)  
  
return q_embed, k_embed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR with suggested code. Had to make changes in apply_rotary_emb_torch to support full cos and sin instead of chunked. Also you suggested from to import apply_rotary_pos_emb from habana_frameworks.torch.hpex.kernels - But was not used.

If we wanted :
from habana_frameworks.torch.hpex.kernels import apply_rotary_pos_emb
q_rot = apply_rotary_pos_emb(q_rot.float(), cos_full.float(),
sin_full.float()).type_as(q)
k_rot = apply_rotary_pos_emb(k_rot.float(), cos_full.float(),
sin_full.float()).type_as(k)

We get this error: File "/usr/local/lib/python3.12/dist-packages/vllm-0.6.3.dev5516+geb01e1dc5.gaudi122-py3.12.egg/vllm/model_executor/models/siglip2navit.py", line 294, in forward
max_seqlen = (cu_seqlens[1:] - cu_seqlens[:-1]).max().item()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Graph compile failed. Recipe: .graph_dumps/HabanaFusedOpLazy_21_21, synStatus=synStatus 26 [Generic failure].

With the current PR:

Metric vs H100 vs GT All-core Query Time [s]
Ovis (no Media pipeline) 98.08% 94.23% 1st run -63.03 & 2nd run-47.67

Signed-off-by: slokesha <[email protected]>
Signed-off-by: slokesha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants