[BUG]: Several bugs in examples/language/gpt/titans

### 🐛 Describe the bug

Several small bug in the examples provided in directory language/gpt/titans:

1. the default option of argument '--use_dummy_dataset' in function main of file train_gpt.py should be set to False, and 'store_false', otherwise even if I set the environment variable DATA, and run the train_gpt.py without the option '--use_dummy_dataset', the training will use dummy data.
2. Please add the py file webtext, otherwise the train_gpt.py will complain that 'WebtextDataset cannot be imported from dataset.webtext'.
3. The throughput is 0, Loss is nan from epoch 1 to epoch 8, then it stalls. The total epoch is 10. I am using the config 'gpt2_small_zero3_pp1d.py'. Nothing modified. 4 Nodes equipped with 8 A100 each are provided. ColossalAI is launched with Slurm. PyTorch is 1.12.1 and CUDA is of version 11.3.  Two screeshot of out are provided:
<img width="1054" alt="image" src="https://user-images.githubusercontent.com/7204483/213075947-66650695-1949-45ad-b8d4-1896c2a385f6.png">
<img width="1054" alt="image" src="https://user-images.githubusercontent.com/7204483/213075984-17c4533d-a40d-4bf1-8889-1eb979b9aa2d.png">
The full output logs are [here](https://drive.google.com/file/d/1dSAeIqXzZv-9FIMdFUVeNDz9vJapU1ip/view?usp=sharing)


### Environment

PyTorch is 1.12.1 and CUDA is of version 11.3. Colossal-AI is built from source without CUDA pre-compiled kernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: Several bugs in examples/language/gpt/titans #2493

🐛 Describe the bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: Several bugs in examples/language/gpt/titans #2493

Description

🐛 Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions