🐛 Describe the bug
Some users have reported that they encounter launch failure when using colossalai run, but there is no error message to tell exactly what went wrong. A sample output is given below.
Error: failed to run torchrun --nproc_per_node=4 --nnodes=1 --node_rank=0 --rdzv_backend=c10d --rdzv_endpoint=127.0.0.1:29500 --rdzv_id=colossalai-default-job train.py --config config.py -s on 127.0.0.1
Environment
No response