Skip to content
Merged
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
b80b78f
Add pytorch backend team (#4405)
kevinch-nv May 21, 2025
0a8461d
test(perf): Pt.2 Add `Llama-3_3-Nemotron-Super-49B-v1` integration-p…
venkywonka May 21, 2025
dbaddb3
Adding two-shot allreduce kernel and mnnvl multicasting buffer (#4216)
zongfeijing May 21, 2025
1cffa99
test: Split test_simple into mpi_utils and cache transceiver tests fo…
DomBrown May 21, 2025
e1b42be
fix: TRT-LLM Gen dtype declaration (#4503)
nekorobov May 21, 2025
1681e9f
chore: remove extra PYTHONPATH (#4453)
achartier May 22, 2025
44cfd75
Agent interface impl for NIXL (#4125)
chuangz0 May 22, 2025
4798d08
chore: Partition LlmArgs into TorchLlmArgs and TrtLlmArgs (#3823)
Superjomn May 22, 2025
9033dd9
[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct (#4415)
moraxu May 22, 2025
099cd3c
chore: Add all_reduce.py benchmark script to test (#4537)
kaiyux May 22, 2025
f491244
feat: add dataset support for benchmark_core_model with LLMAPI (#4457)
achartier May 22, 2025
bc9f1db
fix[nvbug-5228840]: Remove test cases of feature not supported anymor…
HuiGao-NV May 22, 2025
2898d26
feat: add health_generate route to openai serving (Cherry-pick https:…
kaiyux May 22, 2025
e741d2b
Add tritonrelease container (#4455)
Tabrizian May 22, 2025
3410508
cache_transceiver_config (#4556)
chuangz0 May 22, 2025
1a45890
test: waive hanging cases for perf test (#4562)
ruodil May 22, 2025
22c01d5
test: [CI] Add failed cases into waives.txt (#4549)
xinhe-nv May 22, 2025
1e5d526
Chore: clean up _merge_dummy_request method of PyExecutor (#4438)
QiJune May 22, 2025
558eaec
fix sequence data race (#4565)
chuangz0 May 22, 2025
e5c9088
fix: Move cv2 import to load_video function (#4541)
Funatiq May 22, 2025
c713eb5
test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) (…
venkywonka May 22, 2025
14fc48a
[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler (#4402)
mikeiovine May 22, 2025
9c0de25
[feat] Integrate Hopper chunked attention kernels (#4330)
mikeiovine May 22, 2025
3549b68
chroe:clean useless flag (#4567)
nv-guomingz May 22, 2025
1e55d61
Chore: clean up _gather_dp_requests_num method of PyExecutor (#4571)
QiJune May 23, 2025
338744f
fix[nvbug-5295425]: [TRTLLM-5385] fix race condition in MoeLoadBalanc…
dongxuy04 May 23, 2025
60a6c20
Scaffoldingllm supports MCP (#4410)
wu1du2 May 23, 2025
d7d455e
[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243)
pcastonguay May 23, 2025
e3a534d
chore: guardword clean for header file. (#4540)
nv-guomingz May 23, 2025
d7443b6
[https://nvbugspro.nvidia.com/bug/5181262] [test] Unwaive Mistral Nem…
syuoni May 23, 2025
ef280e6
[feat] support fp8 blockscale gemm on sm89 (#4481)
CarstyYou May 23, 2025
38241b2
fix: Fix moe_ep_groups/moe_cluster_groups in Mapping. (#4555)
yuxianq May 23, 2025
87f734b
[https://nvbugs/5297775] fix: Correct memory guard for large MOE test…
djns99 May 23, 2025
1cf0e67
fix: [nvbugs/5066257] serialization improvments (#3869)
coldwaterq May 23, 2025
d69c662
[Fix][Qwen3] fix bug of qwen3 fp4 workflow with EP (#4575)
byshiue May 23, 2025
862bde9
draft[doc]: add mtp tech blog (#4580)
lfr-0531 May 23, 2025
6527c05
chore: fix bug of llama lora test (#4566)
byshiue May 23, 2025
9ae705a
perf: Add fused q_norm/k_norm/RoPE for Qwen3. (#4482)
bobboli May 23, 2025
3ca0533
Waive L0 test (#4609)
yiqingy0 May 23, 2025
419151f
Update the GH main page to expose tech blogs (#4610)
juney-nvidia May 23, 2025
bbea264
Qwen3 supports TRTLLM FP4 MoE backend (#4530)
rosenrodt May 23, 2025
8452775
[TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA (#4535)
zhhuang-nv May 23, 2025
15a59e5
[nvbugs/5301492] ci: waive test_workers_kv_cache_aware_router (#4617)
Funatiq May 23, 2025
7b2bb67
Update CODEOWNERS for PyTorch backend - runtime component (#4620)
juney-nvidia May 23, 2025
ca3eaf4
[nvbug/5028235][fix]pytest bindings tokens logtis comparison. (#4424)
dominicshanshan May 23, 2025
7b2818a
refactor: CreateNewDecoderRequests (#4452)
Funatiq May 23, 2025
ef763b0
fix: rename some terms (#4534)
lowsfer May 23, 2025
20c15fc
Fix invalid testcase name (#4626)
chzblych May 23, 2025
b60846b
fix datatype check (#4606)
chuangz0 May 24, 2025
4a236d1
[Fix][Deepseek] Fix bugs in TestDeepSeekR1 (#4413)
hlu1 May 24, 2025
7a067a8
[TRTLLM-5327] - Add scan stage (#4602)
yiqingy0 May 25, 2025
5dff0bf
[#4633][doc] Fixed typo in scaffolding README.md (#4634)
amemov May 25, 2025
9472c86
Update main README.md with the LLaMA4 perf news (#4636)
juney-nvidia May 25, 2025
2b8f6d2
Fix snake case format (#4559)
shaharmor98 May 25, 2025
bb2f545
fix pipeline tests due to rebase (#4640)
yibinl-nvidia May 26, 2025
4d711be
Feat: add sliding-window-attention generation-phase kernels on Blackw…
PerkzZheng May 26, 2025
8f055f5
feat: Skip sampler for intermediate pp stages. (#4514)
yuxianq May 26, 2025
2fee408
Waive L0 tests (#4645)
yiqingy0 May 26, 2025
4a81991
Chore: refine shutdown signal of PyExecutor (#4614)
QiJune May 26, 2025
ce7f5fa
sort llm request state (#4607)
zhengd-nv May 26, 2025
6f626af
[TRTLLM-4535][infra]: Add marker TIMEOUT for test level (#3905)
EmmaQiaoCh May 26, 2025
502758a
fix: Handle additional model outputs based on pipeline parallel rank …
Funatiq May 26, 2025
11fb007
[TRTLLM-5327] - Fix guardwords scan step (#4654)
yiqingy0 May 26, 2025
fd27f89
fix: Remove duplicate tokenization in generation server (#4492)
Shunkangz May 26, 2025
93a5445
[nvbugs/5274894] fix: Sort requests for functional correctness and pe…
Funatiq May 26, 2025
44eb053
introduce RequestQueueItem class instead of using tuple (#4649)
QiJune May 26, 2025
88190fa
feat: large-scale EP(part 4: Static EP load balancer integration) (#4…
syuoni May 26, 2025
4fb8df2
[Infra] - Add files into the scan ignore list (#4663)
yiqingy0 May 26, 2025
732d92f
[Infra] - Multi-GPU testing support with Slurm (#4454)
yuanjingx87 May 26, 2025
4318037
fix disagg config params (#4646)
chuangz0 May 26, 2025
258d782
[Test] - Waive RTX Pro 6000 Slurm testing (#4672)
chzblych May 26, 2025
157fe62
fix fmha v2 tests (#4661)
qsang-nv May 27, 2025
59f7622
test: rcca https://nvbugs/5223130 (#4510)
xinhe-nv May 27, 2025
268171b
[NVBUG 5301980] Fix fp4 gemm padding. (#4662)
Tracin May 27, 2025
d6e1b71
[Test] - Correct waive the Slurm test stage (#4677)
chzblych May 27, 2025
1582361
Chore: only pad one dummy request for attention dp scenario (#4664)
QiJune May 27, 2025
92a7984
Waive L0 tests (#4686)
yiqingy0 May 27, 2025
5cb4f9b
feat: improve build_wheel.py venv handling (#4525)
tongyuantongyu May 27, 2025
f6c5029
[Infra][TRTLLM-3929] Rerun failure tests (#3264)
yiqingy0 May 27, 2025
5cdd6bb
[AutoDeploy] Increased Model Coverage Mass Migration Week 1 (#4468)
lucaslie May 27, 2025
40a7161
fix: fmha_v2 compilation (#4659)
PerkzZheng May 27, 2025
bb3d998
test: [CI] remove closed bugs (#4638)
xinhe-nv May 27, 2025
e538b0d
refactor: extract and reuse filter_weights. (#4681)
yuxianq May 27, 2025
29ac4c2
fix: fix dsr1 min lat cga ar rate drop(0.2) (#4561)
yunruis May 27, 2025
06eba1e
Update the description for NGC docker images (#4671) (#4702)
MartinMarciniszyn May 27, 2025
5700a4f
feat: Add vanilla MOE. (#4682)
yuxianq May 28, 2025
6493401
Fix handle cancel request for attentionDP (#4648)
Shunkangz May 28, 2025
9c4b8f6
feat: Integration of Fused QKNorm+RoPE. (#4611)
bobboli May 28, 2025
971d16a
[TRTLLM-1658][feat] Enable multiple response in trtllm-serve for TRT …
LinPoly May 28, 2025
f3fba4c
doc: Document the docker release image on NGC (#4705)
MartinMarciniszyn May 28, 2025
b800adc
Fix: hang on disagg when MNNVL two-shot AllReduce is enabled (#4678)
kaiyux May 28, 2025
fbec0c3
Release 0.20 to main (#4577)
amirkl94 May 28, 2025
c875184
Add missing serialization classes (#4642)
Tabrizian May 28, 2025
6682801
Fix rerun step (#4715)
yiqingy0 May 28, 2025
fbe4db2
feat: forward exceptions to Python and catch OOMs (#4497)
ixlmar May 28, 2025
5506f60
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArg…
Superjomn May 28, 2025
6b96f09
chore: remove extra paths to find binaries (#4706)
achartier May 28, 2025
9328348
test: [CI] Add failed cases into waives.txt (#4688)
xinhe-nv May 28, 2025
ed3c67e
tests: [https://nvbugspro.nvidia.com/bug/5289908] run maverick bf16 o…
crazydemo May 28, 2025
1276377
chore: Clean up cpp runtime (#4449)
Funatiq May 28, 2025
6cf1e4d
chore: add -f to pkill calls (#4711)
achartier May 28, 2025
bf691b3
feat: support packed weights in vanilla moe (#4719)
yuxianq May 28, 2025
820c390
chore: [nvbug_5273941] unwaive test_llm_loading_from_ckpt_for_tp2 (#4…
hchings May 28, 2025
812b1ab
feature: KV Cache GPUDirect Storage (#3209)
arthurrasmusson May 28, 2025
2307e91
[fix] add back rtx6000pro tests (#4679)
yuanjingx87 May 29, 2025
ac17142
chore: rename ExecutorBindingsWorker/Proxy (#4716)
Superjomn May 29, 2025
7f29a70
Waive L0 test (#4748)
yiqingy0 May 29, 2025
058f83e
CI: move post-merge multi GPU test of PyTorch backend to H200 (#4733)
QiJune May 29, 2025
7b2b657
infra: [TRTLLM-5247][TRTLLM-5248][TRTLLM-5249] Refactor docker build …
ZhanruiSunCh May 29, 2025
500aca4
test: remove perf test l40s/l20 oom test cases and unwaive tests (#4755)
ruodil May 29, 2025
33a9ba5
fix: test trtllm-bench mgmn (#4613)
Superjomn May 29, 2025
2c48ff5
[feat] add b200 support via slurm (#4709)
yuanjingx87 May 29, 2025
255779a
Chore: fuse _merge_requests method into _fetch_new_requests method (#…
QiJune May 29, 2025
fcadce9
[fix] Eagle-2 LLMAPI pybind argument fix. (#3967)
jhaotingc May 29, 2025
2d61174
[feat] Support RULER + chunked prefill in lm-eval-harness (#4592)
mikeiovine May 29, 2025
79a94a2
refactor: unique_ptr instead of shared_ptr (#4697)
Funatiq May 29, 2025
31bb650
Cherry pick feat/llama4 to main (#4739)
nv-yilinf May 29, 2025
3093c74
[Architecture] Redesign Linear module (#4721)
hlu1 May 29, 2025
5339d36
[perf] Reduce the workspace size of FP4 activation scales for MoE (#4…
jinyangyuan-nvidia May 30, 2025
fe359d9
Added code owners for AutoDeploy (#4769)
juney-nvidia May 30, 2025
36b87b8
chore: fix llm_root when LLM_ROOT is not set (#4741)
achartier May 30, 2025
55d56f8
[JIRA-5226219][fix] Fix Bug in KV cache manager (#4596)
thorjohnsen May 30, 2025
53794b2
test: skip test_llm_hf_gemma_quantization_1gpu_vswa on A100 (#4779)
xinhe-nv May 30, 2025
ee916da
test: Waive test_llm_loading_from_ckpt_for_tp2 (#4797)
syuoni May 30, 2025
f117d6a
Fabric Memory for KV Cache Transfer (#4717)
chuangz0 May 30, 2025
54200ee
fix: random fail of cache router test (#4597)
zhengd-nv May 30, 2025
7e6d06d
feat: estimate GPU mem. usage w/ minimal KV cache (#4574)
ixlmar May 30, 2025
c026dda
fix: iteration logging and typing in PyExecutor (#4734)
ixlmar May 30, 2025
99fdef2
[TRTLLM-5516] perf: replicate dummy request for cuda graph padding (#…
QiJune May 30, 2025
bac22ff
[feat] support sharegpt downloading in benchmark_serving (#4578)
LinPoly May 30, 2025
f82e44b
fix: [nvbugs/5310520] disable embed_tokens's TP when DP enabled for l…
yuxianq May 30, 2025
3b7120d
DeepSeek R1 throughut optimization tech blog for Blackwell GPUs (#4791)
litaotju May 30, 2025

Sorry, this diff is taking too long to generate.

It may be too large to display on GitHub.