[Feature]: Tree attention about Speculative Decoding

### 🚀 The feature, motivation and pitch

I want to implement tree attention for vllm mentioned in [RoadMap](https://github.com/vllm-project/vllm/issues/3861). But I don’t know whether I should implement it based on paged-attention kernel implemented in vllm or FlashInfer due to I found we plan to replace this kernel in this [PR.](https://github.com/vllm-project/vllm/pull/2772)

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Tree attention about Speculative Decoding #3960

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Tree attention about Speculative Decoding #3960

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions