You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to implement tree attention for vllm mentioned in RoadMap. But I don’t know whether I should implement it based on paged-attention kernel implemented in vllm or FlashInfer due to I found we plan to replace this kernel in this PR.
Alternatives
No response
Additional context
No response
cadedaniel, reyoung and kevinhuTechxGenus, reyoung and kevinhu