Skip to content

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

Notifications You must be signed in to change notification settings

Samuelpillai/machine-learning-classification-regression-nlp

Repository files navigation

Machine Learning: Classification, Regression & NLP

A curated repository of end-to-end machine learning projects demonstrating the application of classification, regression, and natural language processing (NLP) techniques using real-world datasets.

Each notebook showcases not just model training and evaluation, but also proper data preprocessing, visualization, and pipeline design—ideal for applying ML in practical environments.


Key Features

  • Supervised ML (Classification & Regression)
  • Text preprocessing & sentiment analysis (NLP)
  • Feature engineering and selection
  • Model evaluation with precision, recall, F1, ROC-AUC, and MSE
  • Clean and annotated Jupyter Notebooks

Technologies Used

Tool/Library Purpose
scikit-learn ML models, pipelines, metrics
pandas / numpy Data preprocessing and manipulation
matplotlib Visualization
nltk / re Text preprocessing and tokenization
Jupyter Notebook Interactive experimentation environment

Project Structure

machine-learning-classification-regression-nlp ├── classfication_regression_nlp-Report-Samuel-Pillai.ipynb # Final report notebook with analysis ├── classification_regression_nlp.ipynb # Cleaned version for public viewing ├── README.md

How to Run

Recommended environment: Python 3.9+ with Jupyter installed.

bash:

(Optional) Create a virtual environment

python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows

Install dependencies

pip install -r requirements.txt

Launch Jupyter Notebook

jupyter notebook

Demo Highlights

•	 Predicting numeric targets using regression (e.g., housing prices)
•	 Classifying sentiments or labels from text
•	 Text cleaning, tokenization, stopword removal
•	 Model comparisons: Logistic Regression, Decision Trees, Random Forests

License

This project is released under the MIT License.

Author

Samuel Pillai Email: [email protected]

About

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published