Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions server/Makefile-flash-att-v2
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
flash_att_v2_commit := 4f285b354796fb17df8636485b9a04df3ebbb7dc
flash_att_v2_commit := 92dd5703ecdb99aa4a4aee9817f28557907403a2 # v2.3.6

flash-attention-v2:
# Clone flash attention
Expand All @@ -10,4 +10,4 @@ build-flash-attention-v2: flash-attention-v2
cd flash-attention-v2 && python setup.py build

install-flash-attention-v2: build-flash-attention-v2
cd flash-attention-v2 && python setup.py install
cd flash-attention-v2 && python setup.py install
3 changes: 3 additions & 0 deletions server/text_generation_server/utils/flash_attn.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,15 @@ def attention(
out,
cu_seqlens,
cu_seqlens,
None, # seqused_k added in ce3e728
max_s,
max_s,
0.0,
softmax_scale,
False,
True,
-1, # window_size[0] added in 083e8f52. -1 means infinite window size
-1, # window_size[1] added in 083e8f52. -1 means infinite window size
False,
None,
)
Expand Down