Skip to content

SiriBatchu/Data_Science_Clustering

Repository files navigation

Data_Science_Clustering_Algorithms

This repository contains implementations of various clustering algorithms using Python and popular data science libraries. Each algorithm is implemented in a separate Google Colab notebook with detailed explanations and visualizations.

Algorithms Implemented

Youtube : https://youtu.be/un_lzeGhrNI

  1. K-Means Clustering

This notebook implements K-means clustering from scratch. K-means is an unsupervised learning algorithm that groups similar data points into clusters. The implementation includes steps for initializing centroids, assigning points to clusters, and updating centroids iteratively.

Colab : https://colab.research.google.com/drive/1RBlYPXWNbJJZqYqFvzMwXXHcp-N1BDKZ

  1. Hierarchical Clustering

This notebook demonstrates hierarchical clustering, which creates a tree-like structure of clusters. It includes both agglomerative (bottom-up) and divisive (top-down) approaches, along with visualizations like dendrograms.

Colab : https://colab.research.google.com/drive/1KdMYvP02xaqIJOMNhjKbpbCEsI27EB9T

  1. Gaussian Mixture Models Clustering

This notebook explores Gaussian Mixture Models (GMM) for clustering. GMMs are probabilistic models that assume data points are generated from a mixture of Gaussian distributions. It covers EM algorithm implementation and model selection.

Colab : https://colab.research.google.com/drive/1caTvq6227ZW4efH0t6GS7Y7jUKY1b8Mw

  1. DBSCAN Clustering using PyCaret

This notebook uses the PyCaret library to implement DBSCAN (Density-Based Spatial Clustering of Applications with Noise). DBSCAN is particularly good at finding clusters of arbitrary shape and identifying noise points.

Colab : https://colab.research.google.com/drive/14-Iz6EHAtlX9rjNGYL88AgPLZKABsu26

  1. Anomaly Detection

This notebook focuses on anomaly detection techniques. It includes time series analysis, using methods like Isolation Forest or Elliptic Envelope to identify outliers in the catfish sales dataset.

Colab : https://colab.research.google.com/drive/1jpoWUEj5JoYQsFARVgdjtImGd5wTgxko

  1. Time Series Clustering

This notebook deals with clustering time series data. It covers techniques specific to time series, such as Dynamic Time Warping (DTW) for measuring similarity between temporal sequences.

Colab : https://colab.research.google.com/drive/1NugAoCXTmyOLgI3szlsCHSWXNVotd7A_

  1. Document Clustering

This notebook demonstrates the clustering of text documents. It includes steps for text preprocessing, feature extraction (e.g., TF-IDF), and applying clustering algorithms to group similar documents.

Colab : https://colab.research.google.com/drive/1fbbz3YbNNat8hypH3D3wYqeIxQApE31e

  1. Image Clustering

This notebook shows how to cluster images. It covers image preprocessing, feature extraction (possibly using pre-trained neural networks), and applying clustering algorithms to group similar images.

Colab : https://colab.research.google.com/drive/1wgc_nVENcX0XOhlwnJ8A6zBcrR7egfDx

  1. Audio Extraction and Clustering

This notebook focuses on clustering audio data. It includes steps for audio feature extraction (e.g., MFCCs, spectrograms) and then applying clustering algorithms to group similar audio samples.

Colab : https://colab.research.google.com/drive/1UtHSqFkT0Hu9crdLmercd6o0yzOSIqwJ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published