Skip to content

This is our project for Natural Language Processing course (CS221). This project mainly focused on the comparison between models and preprocessing methods

Notifications You must be signed in to change notification settings

chisphung/CS221-GenresPrediction-from-Overview

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trường Đại học Công nghệ Thông tin | University of Information Technology

Multi-label Movie Genres Classification From Original Movie Overview

Members

This project is a part of the course Natural Language Processing at the University of Information Technology

No Student ID Full name Email
1 23520179 Phùng Minh Chí [email protected]
2 23520183 Nguyễn Hữu Minh Chiến [email protected]
3 23521467 Lê Ngọc Phương Thảo [email protected]

Course Information

  • Course Natural Language Processing
  • Course code: CS221
  • Class code: CS221.P22
  • Semester: HK2 (2024 - 2025)
  • Instructor: TS Nguyễn Trọng Chỉnh

Instruction

  1. Clone the repository:
    git clone hhttps://github.com/chisphung/CS221-GenresPrediction-from-Overview
  2. Install dependencies:
    pip install -r requirements.txt

Data preprocessing:

To preprocess the dataset, run the following command:

python tools/preprocess.py 

You can also download the preprocessed dataset with the following command:

python tools/download.py

Model training:

To train the BERT models, run the following command:

python - m tools.train <pretrained_model_name> <dataset_path>

Replace <pretrained_model_name> with the name of the pretrained model you want to use (e.g., bert-base-uncased) and <dataset_path> with the path to your dataset.

Evaluation:

To evaluate the model, run the following command:

python -m src.evaluate

Modify the target list path and weights path to match your setup

Pretrained model:

To save your time, we are current support 3 pretrained models:

  • bert-base-uncased trained on preprocessded + undersampled dataset
  • distilled-bert-base-uncased trained on preprocessded dataset
  • bert-base-cased trained on raw + undersampled dataset

You can download them from the following links:

After downloading, you can place them in the weights folder.

Prediction:

To make a single prediction using the trained model, run the following command:

python -m src.main

Deployment:

To deploy the model using streamlit, run the following command:

streamlit run src/app.py

About

This is our project for Natural Language Processing course (CS221). This project mainly focused on the comparison between models and preprocessing methods

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •