Support bfloat16 data type #54

WoosukKwon · 2023-05-02T09:28:33Z

Should be merged after #53

This PR adds support for the bfloat16 data type, which is used for some LLMs including Dolly V2.

SUMMARY * `yapf` format a couple of test files TEST PLAN: ran `yapf` in-place locally to get the files updated.

* adds wvSpltK optimization for skinny gemm. --------- Co-authored-by: Hashem Hashemi <[email protected]>

* Pad flashmla_sparse to 128 on blackwell * adjust get_max_prefill_buffer_size * change comments

… API (vllm-project#54) * use rpc to bypass openAI API Signed-off-by: wuhang <[email protected]> * example run Signed-off-by: wuhang <[email protected]> --------- Signed-off-by: wuhang <[email protected]>

WoosukKwon added 28 commits April 30, 2023 22:29

Minor

cfc3886

Remove unused kernels

cffca20

Add support for bfloat16

a97500f

Add bfloat16 option

46936fa

[WIP] Support bfloat16 in attention kernel

b4bef98

Use reduced precision for attention computation

e1b1303

Code cleaning

b576e46

Create attention dir

8bdb09c

Refactor attention_utils

4b10136

Move

c25cea4

Move blocksum to attention

8ad2170

Refactor attention kernel

3a2337b

Move vector data types

f299f32

Remove cuda_primitives

e500681

Change .h to .cuh

cd82a10

Minor fix

beb21bf

Merge branch 'bfloat16' into support-bfloat16

47d1345

Raise a build error for old GPUs

b9355db

Minor

1818852

Minor

e8a7855

Minor

1f91b89

Change cu.h to .h

3a5e9f0

Add Bfloat16 support to attention kernel

502a678

Minor

fa0d5d4

Minor

ee82a9d

Merge branch 'main' into support-bfloat16

13139e1

Fix bugs in merge

6499bc8

Minor

bd2f8ce

WoosukKwon merged commit e070829 into main May 3, 2023

WoosukKwon deleted the support-bfloat16 branch May 3, 2023 21:09

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Support bfloat16 data type (vllm-project#54)

ebc8d5c

yuhuixu1993 mentioned this pull request Jun 2, 2024

[Bug]: loading squeezellm model #5190

Closed

yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024

formatting patch (vllm-project#54)

68d79f7

SUMMARY * `yapf` format a couple of test files TEST PLAN: ran `yapf` in-place locally to get the files updated.

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024

adds wvSpltK optimization for skinny gemm. (vllm-project#54)

131b217

* adds wvSpltK optimization for skinny gemm. --------- Co-authored-by: Hashem Hashemi <[email protected]>

JHLEE17 pushed a commit to JHLEE17/vllm that referenced this pull request Aug 1, 2024

Update ops.py (vllm-project#54)

eaa6c06

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025

Preliminary blackwell enablement (vllm-project#54)

53df680

* Pad flashmla_sparse to 128 on blackwell * adjust get_max_prefill_buffer_size * change comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support bfloat16 data type #54

Support bfloat16 data type #54

Uh oh!

WoosukKwon commented May 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Support bfloat16 data type #54

Support bfloat16 data type #54

Uh oh!

Conversation

WoosukKwon commented May 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant