Add static libraries for batch manager #2

kaiyux · 2023-09-21T03:32:02Z

No description provided.

juney-nvidia · 2023-09-21T03:52:12Z

LGTM, thanks for the quick fix.

# This is the 1st commit message: add download models form www.modelscope.cn # This is the commit message NVIDIA#2: debug # This is the commit message NVIDIA#3: debug

* Fix model name mapping (#2)

* Add README * Add unified converter (#1) * init v3 lite feat * fix moe topk method * fix noaux_tc logic * fix deepseek v3 normal rope * refactor * wo conversion ok debugging build * add quantize for attn.dense * add unified converter support * testing unified converter * add convert checkpoint and update docs --------- Co-authored-by: Zeyu Wang <[email protected]> * update README * add FP8 notes * Update run.py result * Update V3 README * Update usages of FP8 to BF16 instruction * fix model name mapping (#2) * Update HF ckpt BF16 conversion. * fix config of deepseek kv cache * Remove source code * Deepseek V3 FP8 Support --------- Co-authored-by: jershi425 <[email protected]> Co-authored-by: Zeyu Wang <[email protected]> Co-authored-by: Hanyue He <[email protected]> Co-authored-by: root <[email protected]>

Signed-off-by: Dongxu Yang <[email protected]>

* add MNNVL memory mapping support Signed-off-by: Dongxu Yang <[email protected]> * add more MPI environment for trtllm-llmapi-launch Signed-off-by: Dongxu Yang <[email protected]> * add MoE communication and prepare kernels Signed-off-by: Dongxu Yang <[email protected]> * add MNNVL AlltoAll support for DeepSeekV3 Signed-off-by: Dongxu Yang <[email protected]> * add output dump for throughput benchmark Signed-off-by: Dongxu Yang <[email protected]> * support dynamic kernel launch grid Signed-off-by: Dongxu Yang <[email protected]> * address review comments Signed-off-by: Dongxu Yang <[email protected]> * address review comments #2 Signed-off-by: Dongxu Yang <[email protected]> --------- Signed-off-by: Dongxu Yang <[email protected]>

# This is the 1st commit message: kernel Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> remove prints Signed-off-by: Ubuntu <[email protected]> test pass Signed-off-by: Ubuntu <[email protected]> test refactor with more use cases Signed-off-by: Ubuntu <[email protected]> refacor Signed-off-by: Ubuntu <[email protected]> refacor_2 Signed-off-by: Ubuntu <[email protected]> add tuner wip Signed-off-by: Ubuntu <[email protected]> autotuner works Signed-off-by: Ubuntu <[email protected]> bfloat16 works. moer changes to the thop file Signed-off-by: Ubuntu <[email protected]> is tune for autotuner is True --> gets real tactics configs Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> zeros + quant mode is works Signed-off-by: Ubuntu <[email protected]> act int8 Signed-off-by: Ubuntu <[email protected]> removed fp8 for now Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> w4a16 linear module Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> changed cutalss for sm==89 Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> test linear work Signed-off-by: Ubuntu <[email protected]> add license Signed-off-by: Ubuntu <[email protected]> works! Signed-off-by: Ubuntu <[email protected]> refactor + linear test pass Signed-off-by: Ubuntu <[email protected]> preprocess in load weights Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> refactor + rebase Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> Blackwell not supported Signed-off-by: Daniel Afrimi <[email protected]> wip Signed-off-by: Daniel Afrimi <[email protected]> skip blackwell Signed-off-by: Daniel Afrimi <[email protected]> wip Signed-off-by: Daniel Afrimi <[email protected]> works Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#2: rebased Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#3: align with my pld worked version of linear Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#4: wip Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#5: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#6: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#7: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#8: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#9: sys path Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#10: sys path Signed-off-by: Daniel Afrimi <[email protected]>

Signed-off-by: Yuxian Qiu <[email protected]>

Signed-off-by: Yuxian Qiu <[email protected]> Signed-off-by: Fanrong Li <[email protected]>

Add static libraries

ac45219

kaiyux self-assigned this Sep 21, 2023

kaiyux requested review from jdemouth and juney-nvidia September 21, 2023 03:32

juney-nvidia merged commit 9b563ba into main Sep 21, 2023

kaiyux deleted the kaiyu/add_static_libraries branch September 21, 2023 03:52

tdeng521 mentioned this pull request Mar 7, 2024

batch size will affect llm inference results? #1250

Closed

4 tasks

zxs789 mentioned this pull request Jun 4, 2024

H20 Using random weights to infer llama2-13B results in a divide-by-zero error. #1717

Closed

4 tasks

yingcanw added a commit that referenced this pull request Jan 2, 2025

Fix model name mapping (#2) (#2644)

718ef13

* Fix model name mapping (#2)

dongxuy04 added a commit to dongxuy04/TensorRT-LLM that referenced this pull request Apr 25, 2025

address review comments NVIDIA#2

2070dca

Signed-off-by: Dongxu Yang <[email protected]>

wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025

Add static libraries (NVIDIA#2)

73ce7dd

yuxianq added a commit to yuxianq/TensorRT-LLM that referenced this pull request Jul 17, 2025

Online resmooth for fp8 checkpoint on Blackwell. (NVIDIA#2)

a2fb8e5

Signed-off-by: Yuxian Qiu <[email protected]>

litaotju pushed a commit to litaotju/TensorRT-LLM that referenced this pull request Jul 24, 2025

Online resmooth for fp8 checkpoint on Blackwell. (NVIDIA#2)

753cfde

Signed-off-by: Yuxian Qiu <[email protected]> Signed-off-by: Fanrong Li <[email protected]>

yuxianq added a commit to yuxianq/TensorRT-LLM that referenced this pull request Jul 28, 2025

Online resmooth for fp8 checkpoint on Blackwell. (NVIDIA#2)

d3e1797

Signed-off-by: Yuxian Qiu <[email protected]> Signed-off-by: Fanrong Li <[email protected]>

zongfeijing pushed a commit to zongfeijing/TensorRT-LLM that referenced this pull request Jul 31, 2025

Online resmooth for fp8 checkpoint on Blackwell. (NVIDIA#2)

1cdfac1

Signed-off-by: Yuxian Qiu <[email protected]> Signed-off-by: Fanrong Li <[email protected]>

HuiGao-NV mentioned this pull request Aug 20, 2025

[https://nvbugs/5410391][bug] Support to share device buffers in attention meta #6557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add static libraries for batch manager #2

Add static libraries for batch manager #2

Uh oh!

kaiyux commented Sep 21, 2023

Uh oh!

juney-nvidia commented Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add static libraries for batch manager #2

Add static libraries for batch manager #2

Uh oh!

Conversation

kaiyux commented Sep 21, 2023

Uh oh!

juney-nvidia commented Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants