A lightweight simulator for large-scale DNN workloads on multi-GPU systems. TrioSim supports various parallelism strategies including data parallelism, tensor parallelism, and pipeline parallelism.
TrioSim provides processed sample traces for immediate use. These traces are located in the ./sample_trace/ directory.
To start a test quickly: skip the trace collection steps (Section 1); go directly to Section 2 to begin simulation.
- Python: 3.10.12
- CUDA: 12.1
- torch: 2.1.0+cu121
- torchvision: 0.16.0+cu121
- torchaudio: 2.1.0+cu121
The codes use the ILSVRC2012_img_val dataset. For a quick start, a subset of 256 images is included under ./tracer.
To collect traces from PyTorch models, we use PyTorch Profiler to gather layer or operator time information, and the Execution Graph Observer tool to collect detailed input, output, and other tensor or data information. The batch size is set via command-line arguments. You can also customize the number of iterations (num_iters) and the model to trace (listmodel) directly in the code.
Here is a code example for collecting trace when batch size is 16:
python tracer/datacollect.py 16This will generate two types of files:
profiler_xx.json: Contains timing information for each operatorgraph_xx.json: Contains detailed tensor information.
For a quick start, generated traces are available in:
tracer/data/graph/graph_xx.jsontracer/data/profiler/profiler_xx.json
The TARGET_OP_PREFIXES variable allows users to define which layers are included in tensor parallelism. By default, it targets 'convolution', 'linear', and 'embedding' layers.
Run the following command to convert the collected traces into TrioSim format:
python tracer/dataprocess.pyThe processed traces: tensor.csv and trace.csv, will be available under:
./tracer/data/middledata/trace/XXmodel
Install Go by following the official installation guide: Go Installation Guide
The simulator can be configured using the following command-line flags:
| Flag | Type | Default | Description |
|---|---|---|---|
-trace-dir |
string | "../sample_trace/trace2-h100-bs128/vgg13/" | Directory containing trace files |
-batch-size |
int | 128 | Original trace batch size |
-batch-size-sim |
int | -1 | Simulation batch size (defaults to batch-size) |
-bandwidth |
float | 696 | GPU to remote memory bandwidth (GBps) |
-ptp-bandwidth |
float | 65 | GPU to GPU bandwidth (GBps) |
-GPUnumber |
int | 8 | Number of GPUs |
-micro-batch-size |
int | -1 | Micro batch size for pipeline parallelism |
-case |
int | 0 | Simulation mode: 0=training, 1=standard data parallel, 2=distributed data parallel, 3=tensor parallel, 4=pipeline parallel |
-capacity |
int | 40 | Memory capacity of each device (1 << capacity) |
-numCols |
int | -1 | Number of columns in optical network mesh |
-numRows |
int | 1 | Number of rows in optical network mesh |
-interconnects |
int | 0 | Interconnect type: 0=electrical, 1=optical |
- Navigate to the triosim directory:
cd triosim- Run the simulator with your desired configuration:
go run main.go \
-batch-size 128 \
--batch-size-sim 128 \
-trace-dir ../sample_trace/trace2-h100-bs128/vgg13 \
--GPUnumber 4 \
--case 1