This project is a part of the course Natural Language Processing at the University of Information Technology
No | Student ID | Full name | |
---|---|---|---|
1 | 23520179 | Phùng Minh Chí | [email protected] |
2 | 23520183 | Nguyễn Hữu Minh Chiến | [email protected] |
3 | 23521467 | Lê Ngọc Phương Thảo | [email protected] |
- Course Natural Language Processing
- Course code: CS221
- Class code: CS221.P22
- Semester: HK2 (2024 - 2025)
- Instructor: TS Nguyễn Trọng Chỉnh
- Clone the repository:
git clone hhttps://github.com/chisphung/CS221-GenresPrediction-from-Overview
- Install dependencies:
pip install -r requirements.txt
To preprocess the dataset, run the following command:
python tools/preprocess.py
You can also download the preprocessed dataset with the following command:
python tools/download.py
To train the BERT models, run the following command:
python - m tools.train <pretrained_model_name> <dataset_path>
Replace <pretrained_model_name>
with the name of the pretrained model you want to use (e.g., bert-base-uncased
) and <dataset_path>
with the path to your dataset.
To evaluate the model, run the following command:
python -m src.evaluate
Modify the target list path and weights path to match your setup
To save your time, we are current support 3 pretrained models:
bert-base-uncased
trained on preprocessded + undersampled datasetdistilled-bert-base-uncased
trained on preprocessded datasetbert-base-cased
trained on raw + undersampled dataset
You can download them from the following links:
After downloading, you can place them in the weights
folder.
To make a single prediction using the trained model, run the following command:
python -m src.main
To deploy the model using streamlit, run the following command:
streamlit run src/app.py