Skip to content

[Performance]: Diagnose speed difference using spec decode branch #28915

@tomasruizt

Description

@tomasruizt

Report of performance regression

@litone01 reported a performance regression on the branch of spec decode (#24322), where models run slower than on main even without using spec decode.

His branch: https://github.com/litone01/vllm/tree/origin/feature/spec-decode-draft-model-debug

This issue is for tracking progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions