Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion benchmarks/cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ python3 prepare_dataset.py \
```

For datasets that don't have prompt key, set --dataset-prompt instead.
Take [cnn_dailymail dataset](https://huggingface.co/datasets/cnn_dailymail) for example:
Take [cnn_dailymail dataset](https://huggingface.co/datasets/abisee/cnn_dailymail) for example:
```
python3 prepare_dataset.py \
--tokenizer <path/to/tokenizer> \
Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/baichuan/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The script accepts an argument named model_version, whose value should be `v1_7b
In addition, there are two shared files in the folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/bloom/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The TensorRT-LLM BLOOM implementation can be found in [tensorrt_llm/models/bloom
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/chatglm-6b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix

Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/chatglm2-6b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix

Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/chatglm3-6b-32k/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/contrib/ch
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix

Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/deepseek_v1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The TensorRT-LLM Deepseek-v1 implementation can be found in [tensorrt_llm/models
In addition, there are three shared files in the parent folder [`examples`](../../../) can be used for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the model inference output by given an input text.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../mmlu.py`](../../../mmlu.py) to running score script from https://github.com/declare-lab/instruct-eval to compare HF model and TensorRT-LLM model on the MMLU dataset.

## Support Matrix
Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/deepseek_v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The TensorRT-LLM Deepseek-v2 implementation can be found in [tensorrt_llm/models
In addition, there are three shared files in the parent folder [`examples`](../../../) can be used for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the model inference output by given an input text.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../summarize.py`](../../../summarize.py) to summarize the article from [cnn_dailmail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset, it can running the summarize from HF model and TensorRT-LLM model.
* [`../../../mmlu.py`](../../../mmlu.py) to running score script from https://github.com/declare-lab/instruct-eval to compare HF model and TensorRT-LLM model on the MMLU dataset.

## Support Matrix
Expand Down
4 changes: 2 additions & 2 deletions examples/models/contrib/falcon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The TensorRT-LLM Falcon implementation can be found in [tensorrt_llm/models/falc
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down Expand Up @@ -193,7 +193,7 @@ If the engines are built successfully, you will see output like (falcon-rw-1b as

### 4. Run summarization task with the TensorRT engine(s)
The `../../../summarize.py` script can run the built engines to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

```bash
# falcon-rw-1b
Expand Down
4 changes: 2 additions & 2 deletions examples/models/contrib/gptj/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ code is located in [`examples/models/contrib/gptj`](./). There is one main file:
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down Expand Up @@ -238,7 +238,7 @@ python3 ../../../run.py --max_output_len=50 --engine_dir=gptj_engine --tokenizer
## Summarization using the GPT-J model

The following section describes how to run a TensorRT-LLM GPT-J model to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The script can also perform the same summarization using the HF GPT-J model.

Expand Down
4 changes: 2 additions & 2 deletions examples/models/contrib/gptneox/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The TensorRT-LLM GPT-NeoX implementation can be found in [`tensorrt_llm/models/g
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down Expand Up @@ -118,7 +118,7 @@ trtllm-build --checkpoint_dir ./gptneox/20B/trt_ckpt/int8_wo/2-gpu/ \
### 4. Summarization using the GPT-NeoX model

The following section describes how to run a TensorRT-LLM GPT-NeoX model to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The script can also perform the same summarization using the HF GPT-NeoX model.

Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/grok/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The TensorRT-LLM Grok-1 implementation can be found in [tensorrt_llm/models/grok
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* INT8 Weight-Only
Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/internlm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The TensorRT-LLM InternLM example code lies in [`examples/models/contrib/internl
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16 / BF16
Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/jais/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The TensorRT-LLM support for Jais is based on the GPT model, the implementation
In addition, there are two shared files in the parent folder [`examples`](../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
The tested configurations are:
Expand Down
2 changes: 1 addition & 1 deletion examples/models/contrib/mpt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The TensorRT-LLM MPT implementation can be found in [`tensorrt_llm/models/mpt/mo
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down
4 changes: 2 additions & 2 deletions examples/models/contrib/opt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The TensorRT-LLM OPT implementation can be found in [`tensorrt_llm/models/opt/mo
In addition, there are two shared files in the parent folder [`examples`](../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down Expand Up @@ -127,7 +127,7 @@ trtllm-build --checkpoint_dir ./opt/66B/trt_ckpt/fp16/4-gpu/ \
### 4. Summarization using the OPT model

The following section describes how to run a TensorRT-LLM OPT model to summarize the articles from the
[cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. For each summary, the script can compute the
[cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset. For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
The script can also perform the same summarization using the HF OPT model.

Expand Down
4 changes: 2 additions & 2 deletions examples/models/contrib/skywork/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The TensorRT-LLM Skywork example code lies in [`examples/models/contrib/skywork`
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16 & BF16
Expand Down Expand Up @@ -78,7 +78,7 @@ trtllm-build --checkpoint_dir ./skywork-13b-base/trt_ckpt/bf16 \

### 4. Summarization using the Engines

After building TRT engines, we can use them to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
After building TRT engines, we can use them to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.

```bash
# fp16
Expand Down
4 changes: 2 additions & 2 deletions examples/models/contrib/smaug/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The TensorRT-LLM support for Smaug-72B-v0.1 is based on the LLaMA model, the imp
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`../../../run.py`](../../../run.py) to run the inference on an input text;
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`../../../summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix

Expand Down Expand Up @@ -43,7 +43,7 @@ trtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8 \

### Run Summarization

After building TRT engine, we can use it to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.
After building TRT engine, we can use it to perform various tasks. TensorRT-LLM provides handy code to run summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and get [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores. The `ROUGE-1` score can be used to validate model implementations.

```bash
mpirun -n 8 -allow-run-as-root python ../../../summarize.py \
Expand Down
2 changes: 1 addition & 1 deletion examples/models/core/commandr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The TensorRT-LLM Command-R example code is located in [`examples/models/core/com
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix

Expand Down
2 changes: 1 addition & 1 deletion examples/models/core/gemma/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \

We provide three examples to run inference `run.py`, `summarize.py` and `mmlu.py`. `run.py` only run inference with `input_text` and show the output.

`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
`summarize.py` runs summarization on [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset and evaluate the model by [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.

`mmlu.py` runs MMLU to evaluate the model by accuracy.

Expand Down
2 changes: 1 addition & 1 deletion examples/models/core/glm-4-9b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The TensorRT-LLM ChatGLM example code is located in [`examples/models/core/glm-4
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix

Expand Down
4 changes: 2 additions & 2 deletions examples/models/core/gpt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ The TensorRT-LLM GPT implementation can be found in [`tensorrt_llm/models/gpt/mo
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16
Expand Down Expand Up @@ -222,7 +222,7 @@ Input [Text 0]: "Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: " chef before moving to London in the early"
```

The [`summarize.py`](../../../summarize.py) script can run the built engines to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
The [`summarize.py`](../../../summarize.py) script can run the built engines to summarize the articles from the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.
For each summary, the script can compute the
[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) scores and use the `ROUGE-1` score to validate the implementation.
By passing `--test_trt_llm` flag, the script will evaluate TensorRT-LLM engines. You may also pass `--test_hf` flag to evaluate the HF model.
Expand Down
2 changes: 1 addition & 1 deletion examples/models/core/internlm2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The TensorRT-LLM InternLM2 example code lies in [`examples/models/core/internlm2
In addition, there are two shared files in the parent folder [`examples`](../../../) for inference and evaluation:

* [`run.py`](../../../run.py) to run the inference on an input text;
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.
* [`summarize.py`](../../../summarize.py) to summarize the articles in the [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset.

## Support Matrix
* FP16 / BF16
Expand Down
Loading