Update TRT-LLM code #3

kaiyux · 2023-09-28T16:00:53Z

No description provided.

# This is the 1st commit message: add download models form www.modelscope.cn # This is the commit message NVIDIA#2: debug # This is the commit message NVIDIA#3: debug

Update TRT-LLM code

Add support for CPP inference with decoder

# This is the 1st commit message: kernel Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> remove prints Signed-off-by: Ubuntu <[email protected]> test pass Signed-off-by: Ubuntu <[email protected]> test refactor with more use cases Signed-off-by: Ubuntu <[email protected]> refacor Signed-off-by: Ubuntu <[email protected]> refacor_2 Signed-off-by: Ubuntu <[email protected]> add tuner wip Signed-off-by: Ubuntu <[email protected]> autotuner works Signed-off-by: Ubuntu <[email protected]> bfloat16 works. moer changes to the thop file Signed-off-by: Ubuntu <[email protected]> is tune for autotuner is True --> gets real tactics configs Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> zeros + quant mode is works Signed-off-by: Ubuntu <[email protected]> act int8 Signed-off-by: Ubuntu <[email protected]> removed fp8 for now Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> w4a16 linear module Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> changed cutalss for sm==89 Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> test linear work Signed-off-by: Ubuntu <[email protected]> add license Signed-off-by: Ubuntu <[email protected]> works! Signed-off-by: Ubuntu <[email protected]> refactor + linear test pass Signed-off-by: Ubuntu <[email protected]> preprocess in load weights Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> refactor + rebase Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> wip Signed-off-by: Ubuntu <[email protected]> Blackwell not supported Signed-off-by: Daniel Afrimi <[email protected]> wip Signed-off-by: Daniel Afrimi <[email protected]> skip blackwell Signed-off-by: Daniel Afrimi <[email protected]> wip Signed-off-by: Daniel Afrimi <[email protected]> works Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#2: rebased Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#3: align with my pld worked version of linear Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#4: wip Signed-off-by: Ubuntu <[email protected]> # This is the commit message NVIDIA#5: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#6: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#7: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#8: refactor Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#9: sys path Signed-off-by: Daniel Afrimi <[email protected]> # This is the commit message NVIDIA#10: sys path Signed-off-by: Daniel Afrimi <[email protected]>

kaiyux added 4 commits September 28, 2023 09:00

Update code

6e9e318

Add .a libs

766926c

Update submodule

496456e

Update submodule

6111f52

kaiyux merged commit 279e329 into main Sep 28, 2023

Nam-ang mentioned this pull request Jan 9, 2024

When run llama2, Caught signal 11 (Segmentation fault) #752

Closed

tdeng521 mentioned this pull request Mar 7, 2024

batch size will affect llm inference results? #1250

Closed

4 tasks

hademircii mentioned this pull request Mar 12, 2024

Flan-T5 models with Tensor Parallelism #1286

Open

4 tasks

zxs789 mentioned this pull request Jun 4, 2024

H20 Using random weights to infer llama2-13B results in a divide-by-zero error. #1717

Closed

4 tasks

wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025

Merge pull request NVIDIA#3 from NVIDIA/kaiyu/update

3aa8467

Update TRT-LLM code

kipraveen pushed a commit to kipraveen/TensorRT-LLM that referenced this pull request May 12, 2025

Merge pull request NVIDIA#3 from kipraveen/canary_cpp

0d73b5c

Add support for CPP inference with decoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update TRT-LLM code #3

Update TRT-LLM code #3

Uh oh!

kaiyux commented Sep 28, 2023

Uh oh!

Uh oh!

Update TRT-LLM code #3

Update TRT-LLM code #3

Uh oh!

Conversation

kaiyux commented Sep 28, 2023

Uh oh!

Uh oh!