Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
5fa1914
[None][chore] Bump version to 1.1.0rc0 (#6651)
yiqingy0 Aug 7, 2025
85af621
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6…
amitz-nv Aug 7, 2025
6c1f7d8
[None][test] correct test-db context for perf yaml file (#6686)
ruodil Aug 7, 2025
8207d5f
[None] [feat] Add model gpt-oss (#6645)
hlu1 Aug 7, 2025
0a467b0
[https://nvbugs/5409414][fix] fix Not registered specs (#6660)
xinhe-nv Aug 7, 2025
8ec3b1d
[None][feat] : Add FP8 context MLA support for SM120 (#6059)
peaceh-nv Aug 7, 2025
c23e8e7
[TRTLLM-6092][doc] Add LoRA feature usage doc (#6603)
shaharmor98 Aug 7, 2025
1b9781e
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding …
syuoni Aug 7, 2025
453a06e
[TRTLLM-6881][feat] Include attention dp rank info with KV cache even…
pcastonguay Aug 7, 2025
3c44b44
[None][infra] Fix guardwords (#6711)
EmmaQiaoCh Aug 7, 2025
46357e7
[None][package] Pin cuda-python version to >=12,<13 (#6702)
yiqingy0 Aug 7, 2025
0223de0
[None][doc] Add deployment guide section for VDR task (#6669)
nv-guomingz Aug 7, 2025
4055b76
[None][fix] disagg ctx pp4 + gen pp4 integ test (#6489)
raayandhar Aug 7, 2025
e968f98
[None][feat] Clean up ngram auto mode, add max_concurrency to configs…
mikeiovine Aug 7, 2025
3b2dd40
[None][chore] Remove py_executor from disagg gh team (#6716)
pcastonguay Aug 7, 2025
4ecda91
[https://nvbugs/5423962][fix] Address broken links (#6531)
chenopis Aug 7, 2025
db8dc97
[None][fix] Migrate to new cuda binding package name (#6700)
tongyuantongyu Aug 7, 2025
980929e
[https://nvbugs/5410687][fix] Hopper w4a8 groupwise MoE interleave (#…
symphonylyh Aug 7, 2025
8227616
[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500)
Tabrizian Aug 8, 2025
efca359
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default …
dcampora Aug 8, 2025
88ced50
[TRTQA-2920][fix] Add failed cases into waives.txt (#6719)
xinhe-nv Aug 8, 2025
22f45a0
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test (#6685)
ruodil Aug 8, 2025
2f2f5cc
[TRTLLM-6744][feat] Remove input_sf swizzle for module WideEPMoE (#6231)
StudyingShao Aug 8, 2025
1cf6694
[None][fix] Fix unnecessary GPU synchronization in torch sampler caus…
zhanghaotong Aug 8, 2025
aee828d
[TRTLLM-6854][feat] Enable guided decoding with disagg serving (#6704)
syuoni Aug 8, 2025
064eb7a
[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611)
2ez4bz Aug 8, 2025
b15d6fb
[None][test] fix yml condition error under qa folder (#6734)
ruodil Aug 8, 2025
9687bb4
[None][doc] Add doc for multimodal feature support matrix (#6619)
chang-l Aug 8, 2025
d913955
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell (#6616)
limin2021 Aug 8, 2025
294e0d3
[https://nvbugs/5436461][infra] Adjust free_gpu_memory_fraction of te…
leslie-fang25 Aug 8, 2025
9ff4e75
[None][refactor] Combine resmooth_to_fp8_e8m0 and transform_sf_into_r…
yuxianq Aug 8, 2025
5f45227
[https://nvbugs/5437106][fix] Fix llama4 scout TRTLLM attn_backend (#…
JunyiXu-nv Aug 8, 2025
32ad7f3
[None][fix] Remove lock related typo in py_executor (#6653)
lancelly Aug 8, 2025
ebdc43e
[None][feat] move kv cache measure into transfer session (#6633)
zhengd-nv Aug 8, 2025
e251f7c
[None][fix]revert kvcache transfer (#6709)
chuangz0 Aug 8, 2025
b8f036f
[TRTLLM-6650][fix] Enhance CUDA graph + Beam search to correctly hand…
stnie Aug 8, 2025
d45236b
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm (#6184)
Wanli-Jiang Aug 8, 2025
90145cf
[None][feat] Optimize CUDA graph memory usage for spec decode cases (…
mikeiovine Aug 8, 2025
efcb8f7
[TRTLLM-7025] [infra] Reorganize CODEOWNERS to rectify `examples` map…
venkywonka Aug 8, 2025
cc0f4c8
[None][doc] Move AutoDeploy README.md to torch docs (#6528)
Fridah-nv Aug 8, 2025
d066750
[None][fix] WAR GPT OSS on H20 with Triton MOE (#6721)
dongfengy Aug 8, 2025
9778788
[TRTLLM-6420][feat] add support for Eclairv2 model - cherry-pick chan…
yibinl-nvidia Aug 9, 2025
bcf5ec0
[None][feat] Core Metrics Implementation (#5785)
hcyezhang Aug 9, 2025
d643aef
[Perf] Improve Llama4 performance for small max_seqlen cases (#6306)
nv-yilinf Aug 9, 2025
de47282
[TRTLLM-6637][feat] Resolve KV cache divergence issue (#6628)
ziyixiong-nv Aug 9, 2025
ee19ca5
[None][infra] Waive test main 0808 (#6751)
EmmaQiaoCh Aug 10, 2025
3c5aec1
[#5048][enhance] AutoDeploy: Optimize prepare_inputs (#6634)
galagam Aug 10, 2025
199f306
[None][chore][kv cache manager] Dead code elimination, we no longer r…
eopXD Aug 10, 2025
14b36e0
[TRTLLM-6174][feat] Enable FP32 mamba ssm cache (#6574)
shaharmor98 Aug 10, 2025
4142320
[https://nvbugs/5444937][fix] Fixing kv_cache_event unit test (#6753)
pcastonguay Aug 10, 2025
b6baa9e
[TRTLLM-6823][doc] Add checkpoint refactor docs (#6592)
shaharmor98 Aug 10, 2025
60073a7
[None][feat] Support SharedTensor on MultimodalParams (#6254)
yechank-nvidia Aug 11, 2025
4b4b91a
[None][feat] improve dataloading for benchmark_dataset by using batch…
zerollzeng Aug 11, 2025
767879e
[https://nvbugs/5431127][fix] Run test_disaggregated_deepseek_v3_lite…
bo-nv Aug 11, 2025
c566a8d
[None][fix] fix same pp disagg (#6730)
chuangz0 Aug 11, 2025
49bcaa4
Add gpt-oss GSM8K test. (#6732)
Tracin Aug 11, 2025
b3e8fa2
[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu (#6487)
MrGeva Aug 11, 2025
62d6c98
[TRTLLM-5633][infra] Force set changed file diff to empty string for …
yiqingy0 Aug 11, 2025
9c358c2
[None][chore] remove closed bugs (#6772)
xinhe-nv Aug 11, 2025
d6ad4a9
[None][infra] Waive failed tests on main 0811 (#6778)
EmmaQiaoCh Aug 11, 2025
9a8195e
fix: Ensure that Python stub generation works against libnvidia-ml st…
MartinMarciniszyn Aug 11, 2025
83dbc6c
[TRTLLM-5532][feat] store the block of context request into kv cache …
byshiue Aug 11, 2025
a2e9153
[None][doc] Add K2 tool calling examples (#6667)
lancelly Aug 11, 2025
5145e9d
[None][infra] Unwaive an updated case to test (#6791)
EmmaQiaoCh Aug 11, 2025
7e33ed6
[None][chore] always try-catch when clear build folder in build_wheel…
zhenhuaw-me Aug 11, 2025
c9fe07e
[TRTLLM-6812][feat] Add standardized GitHub issue templates and disab…
venkywonka Aug 11, 2025
7ab8112
[None][fix] Refactoring to avoid circular import when importing torch…
rakib-hasan Aug 11, 2025
56bfc3a
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically (#6763)
achartier Aug 11, 2025
be9dd47
[https://nvbugs/5385987][fix] Fix Qwen2 quantization issue by pinning…
chang-l Aug 12, 2025
ead89a0
[None][perf] Improve the performance of online EPLB on Hopper by bett…
jinyangyuan-nvidia Aug 12, 2025
b4fcd5f
[https://nvbugs/5441438][fix] Set correct draft length for the cuda g…
ziyixiong-nv Aug 12, 2025
7c686ba
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding an…
syuoni Aug 12, 2025
0dc4b4e
[#4403][autodeploy] Refactor: Move more transformations to new inf op…
Fridah-nv Aug 12, 2025
27fc351
[None][feat] CUTLASS MoE FC2+Finalize fusion (#3294)
sklevtsov-nvidia Aug 12, 2025
f7c13a4
[TRTLLM-6906][chore] Using pybind to bind functions in thop/attention…
lancelly Aug 12, 2025
ab0d768
[None][fix] Fix attention dp log (#6570)
Shunkangz Aug 12, 2025
8845e0f
[None][fix] fix ci (#6814)
QiJune Aug 12, 2025
e35fca4
[TRTQA-2920][chore] improve hang tests (#6781)
xinhe-nv Aug 12, 2025
a060e12
[https://nvbugs/5438869][fix] Set nvfp4 expert w1 w3 weight scale to …
jhaotingc Aug 12, 2025
81f0ded
[None][feat] Add GPT OSS support for AutoDeploy (#6641)
nvchenghaoz Aug 12, 2025
dd11e08
[#6187][feat] add LayerNorm module (#6625)
Funatiq Aug 12, 2025
45c7518
[None][refactor] Simplify decoder state initialization (#6559)
Funatiq Aug 12, 2025
bd9a6dd
[TRTLLM-7008][fix] fix wideEP weights loading and args (#6789)
dongxuy04 Aug 12, 2025
2923eb8
[None][fix] Refactoring input prep to allow out-of-tree models (#6497)
rakib-hasan Aug 13, 2025
47806f0
feat: Support custom repo_dir for SLURM script (#6546)
kaiyux Aug 13, 2025
1bbc0e3
[None][fix] Pre-allocate workspaces for DeepGEMM MoE to avoid frequen…
lfr-0531 Aug 13, 2025
12102e2
[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622)
yechank-nvidia Aug 13, 2025
f68e03e
[https://nvbugs/5452167][fix] Fix ngram padding issue (#6837)
mikeiovine Aug 13, 2025
2e0081b
[#6530][fix] Fix script when using calibration tensors from modelopt …
achartier Aug 13, 2025
50e5e72
[https://nvbugs/5412456][fix] Fix an illegal instruction was encounte…
zhou-yuxin Aug 13, 2025
1d80df0
[None][feat] DeepEP LL combine FP4 (#6822)
yilin-void Aug 13, 2025
bc5f766
[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic…
hyukn Aug 13, 2025
fe7dda8
[TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766)
Shixiaowei02 Aug 13, 2025
0fad602
[TRTLLM-7093][fix] the perf regression to cvt_fp4 kernels (#6851)
PerkzZheng Aug 13, 2025
8416d7f
[https://nvbugs/5412885][doc] Add the workaround doc for H200 OOM (#6…
zhenhuaw-me Aug 13, 2025
2198587
[https://nvbugs/5378031] [feat] Hopper W4A8 MoE supports ModelOpt ckp…
rosenrodt Aug 13, 2025
c7e6145
[None][infra] Waive failed cases on main (#6863)
EmmaQiaoCh Aug 13, 2025
bda42f8
[None][feat] Support running heterogeneous model execution for Nemotr…
danielafrimi Aug 13, 2025
6c52bb0
[https://nvbugs/5302040][feat] Add whisper support (Bert Attention on…
wu6u3tw Aug 13, 2025
58f7783
[https://nvbugs/5394685][fix] the bug with spec-decoding + SWA && an …
PerkzZheng Aug 13, 2025
7cba883
[https://nvbugs/5410399][chore] Unwaive mtp llmapi test (#6833)
mikeiovine Aug 13, 2025
eb4ed18
[None][fix] max_num_sequences argument in nanobind (#6862)
Linda-Stadter Aug 13, 2025
7f92816
Allocate MoE workspace only when necessary
nv-yilinf Aug 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
52 changes: 28 additions & 24 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,39 @@
# Without approval from a member of this team, PRs cannot be merged to release branches.
# * @NVIDIA/trt-llm-release-branch-approval

## TensorRT-LLM Infra
### CI
/jenkins @NVIDIA/trt-llm-ci-infra-devs @NVIDIA/trt-llm-infra-devs
### Setup
/docker @NVIDIA/trt-llm-setup-infra-devs @NVIDIA/trt-llm-infra-devs
### Github workflows
/.github @NVIDIA/trt-llm-gh-workflows-infra-devs @NVIDIA/trt-llm-infra-devs
/.coderabbit.yaml @NVIDIA/trt-llm-gh-workflows-infra-devs @NVIDIA/trt-llm-infra-devs

## TensorRT-LLM - Docs
/docs @NVIDIA/trt-llm-doc-owners

## Examples
/examples @NVIDIA/trt-llm-doc-owners

## TensorRT-LLM - Triton backend
/triton_backend @NVIDIA/trt-llm-triton-backend-devs

# TensorRT-LLM Pytorch backend
/tensorrt_llm/_torch @NVIDIA/trt-llm-torch-devs

## TensorRT-LLM Pytorch - Modules
/tensorrt_llm/_torch/modules @NVIDIA/trt-llm-torch-modules

## TensorRT-LLM Pytorch Models
/tensorrt_llm/_torch/models @NVIDIA/trt-llm-torch-models-devs
/examples/models @NVIDIA/trt-llm-torch-models-devs @NVIDIA/trt-llm-doc-owners

## TensorRT-LLM Pytorch backend - runtime
/tensorrt_llm/_torch/pyexecutor @NVIDIA/trt-llm-torch-runtime-devs
## TensorRT-LLM Pytorch backend - AutoDeploy flow
/tensorrt_llm/_torch/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs
/tensorrt_llm/examples/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs
/examples/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs @NVIDIA/trt-llm-doc-owners

## TensorRT-LLM Pytorch - Speculative Decoding
/tensorrt_llm/_torch/speculative @NVIDIA/trt-llm-torch-spec-decoding
Expand All @@ -31,12 +57,6 @@
/tensorrt_llm/_torch/attention_backend @NVIDIA/trt-llm-torch-attention-devs
/tensorrt_llm/_torch/modules/attention.py @NVIDIA/trt-llm-torch-attention-devs

## TensorRT-LLM Pytorch - Modules
/tensorrt_llm/_torch/modules @NVIDIA/trt-llm-torch-modules


## TensorRT-LLM Pytorch Models
/tensorrt_llm/_torch/models @NVIDIA/trt-llm-torch-models-devs

### TensorRT-LLM Pytorch - Models - Gemma
/tensorrt_llm/_torch/models/modeling_gemma3.py @NVIDIA/trt-llm-torch-models-gemma-devs @NVIDIA/trt-llm-torch-models-devs
Expand Down Expand Up @@ -108,8 +128,6 @@
/cpp/tensorrt_llm/runtime/loraUtils.cpp @NVIDIA/trt-llm-torch-peft
/cpp/tensorrt_llm/runtime/loraUtils.h @NVIDIA/trt-llm-torch-peft

## TensorRT-LLM - Triton backend
/triton_backend @NVIDIA/trt-llm-triton-backend-devs

## TensorRT-LLM trtllm-bench Reviewers
/tensorrt_llm/bench @NVIDIA/trtllm-bench-reviewers
Expand All @@ -121,10 +139,9 @@ docs/source/performance/perf-benchmarking.md @NVIDIA/trtllm-bench-reviewers
/tensorrt_llm/executor @NVIDIA/trt-llm-llmapi-devs

## TensorRT-LLM LLM Disaggregated
/examples/disaggregated @NVIDIA/trt-llm-disagg-devs
/examples/disaggregated @NVIDIA/trt-llm-disagg-devs @NVIDIA/trt-llm-doc-owners
/tensorrt_llm/disaggregated_params.py @NVIDIA/trt-llm-disagg-devs
/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py @NVIDIA/trt-llm-disagg-devs
/tensorrt_llm/_torch/pyexecutor/py_executor.py @NVIDIA/trt-llm-disagg-devs
/cpp/tensorrt_llm/batch_manager/cacheFormatter.cpp @NVIDIA/trt-llm-disagg-devs
/cpp/tensorrt_llm/batch_manager/cacheFormatter.h @NVIDIA/trt-llm-disagg-devs
/cpp/tensorrt_llm/batch_manager/cacheTransBuffer.cpp @NVIDIA/trt-llm-disagg-devs
Expand All @@ -135,19 +152,6 @@ docs/source/performance/perf-benchmarking.md @NVIDIA/trtllm-bench-reviewers
/cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp @NVIDIA/trt-llm-disagg-devs
/cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h @NVIDIA/trt-llm-disagg-devs

## TensorRT-LLM Infra

### CI
/jenkins @NVIDIA/trt-llm-ci-infra-devs @NVIDIA/trt-llm-infra-devs
### Setup
/docker @NVIDIA/trt-llm-setup-infra-devs @NVIDIA/trt-llm-infra-devs
### Github workflows
/tensorrt_llm/.github @NVIDIA/trt-llm-gh-workflows-infra-devs @NVIDIA/trt-llm-infra-devs
/tensorrt_llm/.coderabbit.yaml @NVIDIA/trt-llm-gh-workflows-infra-devs @NVIDIA/trt-llm-infra-devs

## TensorRT-LLM - Docs
/docs @NVIDIA/trt-llm-doc-owners
/examples @NVIDIA/trt-llm-doc-owners

# The rule below requires that any PR modifying public APIs must be approved by at least one member
# of the NVIDIA/trt-llm-committed-api-review-committee or NVIDIA/trt-llm-noncommitted-api-review-committee team.
Expand Down
66 changes: 66 additions & 0 deletions .github/ISSUE_TEMPLATE/01-installation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Adapted from https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE/200-installation.yml
name: 🛠️ Installation
description: Report an issue here when you hit errors during installation.
title: "[Installation]: "
labels: ["Installation"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: System Info
description: |
Please provide the following system information to help us debug your installation issue:

```bash
# System information
cat /etc/os-release
nvidia-smi
nvcc --version
python --version
pip list | grep -E "(tensorrt|torch|cuda)"

# TensorRT-LLM installation method and version
pip show tensorrt_llm
```
value: |
**System Information:**
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT version:
- PyTorch version:
- TensorRT-LLM version:

**Detailed output:**
```text
Paste the output of the above commands here
```
validations:
required: true
- type: textarea
attributes:
label: How you are installing TensorRT-LLM
description: |
Paste the full command you are trying to execute or describe your installation method.
value: |
```sh
# Installation command or method
pip install tensorrt_llm
```
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
- type: checkboxes
id: askllm
attributes:
label: Before submitting a new issue...
options:
- label: Make sure you already searched for relevant issues, and checked the [installation documentation](https://nvidia.github.io/TensorRT-LLM/installation/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.
required: true
41 changes: 41 additions & 0 deletions .github/ISSUE_TEMPLATE/02-new-model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Adapted from https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE/600-new-model.yml
name: 🤗 Support request for a new model from huggingface
description: Submit a proposal/request for a new model from huggingface
title: "[New Model]: "
labels: ["new model"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue+sort%3Acreated-desc+).

#### We also highly recommend you read https://nvidia.github.io/TensorRT-LLM/architecture/add-model.html first to understand how to add a new model.
- type: textarea
attributes:
label: The model to consider.
description: >
A huggingface identifier, pointing to the model, e.g. `meta-llama/Llama-3.1-8B-Instruct` .
validations:
required: true
- type: textarea
attributes:
label: The closest model TensorRT-LLM already supports.
description: >
Here is the list of models already supported by TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM/tree/main/tensorrt_llm/models (TRT backend) and https://github.com/NVIDIA/TensorRT-LLM/tree/main/tensorrt_llm/_torch/models (Pytorch backend) . Which model is the most similar to the model you want to add support for?
- type: textarea
attributes:
label: What's your difficulty of supporting the model you want?
description: >
For example, any new operators or new architecture?
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
- type: checkboxes
id: askllm
attributes:
label: Before submitting a new issue...
options:
- label: Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.
required: true
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/03-documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Adapted from https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE/100-documentation.yml
name: 📚 Documentation
description: Report an issue related to https://nvidia.github.io/TensorRT-LLM/
title: "[Doc]: "
labels: ["Documentation"]
assignees: ["nv-guomingz"]

body:
- type: textarea
attributes:
label: 📚 The doc issue
description: >
A clear and concise description of what content in https://nvidia.github.io/TensorRT-LLM/ is an issue.
validations:
required: true
- type: textarea
attributes:
label: Suggest a potential alternative/fix
description: >
Tell us how we could improve the documentation in this regard.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
- type: checkboxes
id: askllm
attributes:
label: Before submitting a new issue...
options:
- label: Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.
required: true
62 changes: 62 additions & 0 deletions .github/ISSUE_TEMPLATE/04-questions.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Adapted from https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE/300-usage.yml
name: 💻 Questions
description: Raise an issue here if you don't know how to use TensorRT-LLM.
title: "[Usage]: "
labels: ["question"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: System Info
description: |
Please provide the following system information to help us debug your usage issue:

```bash
# System information
nvidia-smi
python --version
pip show tensorrt_llm
```
value: |
**System Information:**
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT-LLM version:

**Detailed output:**
```text
Paste the output of the above commands here
```
validations:
required: true
- type: textarea
attributes:
label: How would you like to use TensorRT-LLM
description: |
A detailed description of how you want to use TensorRT-LLM.
value: |
I want to run inference of a [specific model](put Hugging Face link here). I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.

**Specific questions:**
- Model:
- Use case (e.g., chatbot, batch inference, real-time serving):
- Expected throughput/latency requirements:
- Multi-GPU setup needed:
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
- type: checkboxes
id: askllm
attributes:
label: Before submitting a new issue...
options:
- label: Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.
required: true
40 changes: 40 additions & 0 deletions .github/ISSUE_TEMPLATE/05-feature-request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Adapted from https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE/500-feature-request.yml
name: 🚀 Feature request
description: Submit a proposal/request for a new TensorRT-LLM feature
title: "[Feature]: "
labels: ["feature request"]
assignees: ["laikhtewari"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: 🚀 The feature, motivation and pitch
description: >
A clear and concise description of the feature proposal. Please outline the motivation for the proposal. Is your feature request related to a specific problem? e.g., *"I'm working on X and would like Y to be possible"*. If this is related to another GitHub issue, please link here too.
validations:
required: true
- type: textarea
attributes:
label: Alternatives
description: >
A description of any alternative solutions or features you've considered, if any.
- type: textarea
attributes:
label: Additional context
description: >
Add any other context or screenshots about the feature request.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
- type: checkboxes
id: askllm
attributes:
label: Before submitting a new issue...
options:
- label: Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.
required: true
Loading