A comprehensive reference guide mapping the entire AI, Machine Learning, Data Science, and Data Engineering ecosystem—with categorized tools, libraries, workflows, and Python usage.
- Supervised Learning
- Regression, Classification
- Libraries:
scikit-learn
,XGBoost
,LightGBM
- Unsupervised Learning
- Clustering, PCA
- Libraries:
scikit-learn
,NumPy
,SciPy
- Reinforcement Learning (RL)
- Q-Learning, DQN
- Libraries:
OpenAI Gym
,Stable-Baselines3
- Deep Learning (DL)
- CNN, RNN, Transformers
- Libraries:
TensorFlow
,PyTorch
,Keras
- Python used for: Model building, training, deployment
- Tasks: Sentiment Analysis, NER, Translation
- Libraries:
NLTK
,spaCy
,Hugging Face Transformers
,Gensim
- Python used for: Tokenization, text preprocessing, model training
- Tasks: Image Classification, Object Detection, Segmentation
- Libraries:
OpenCV
,PIL
,PyTorch
,TensorFlow
,Detectron2
- Python used for: Image preprocessing, feature extraction, modeling
- Descriptive, Diagnostic, Predictive, Prescriptive
- Libraries:
Pandas
,NumPy
- Tools: Jupyter, SQL, Python
- Python used for: EDA, statistics, modeling
- Libraries:
Matplotlib
,Seaborn
,Plotly
,Altair
- Tools: Tableau, Power BI
- Python used for: Plots, charts, dashboards
- Tools:
scikit-learn
,pandas
,NumPy
- Python used throughout ML lifecycle
- Tools: Apache NiFi, Talend, Informatica, dbt
- Python used with:
Pandas
,PySpark
- Batch: Apache Spark, AWS Glue
- Stream: Apache Kafka, Apache Flink
- Python used for: Transformation logic, UDFs
- Tools: Apache Airflow, Prefect, Luigi
- Python used to define DAGs and scheduling
- Tools: BigQuery, Redshift, Snowflake
- Python used to connect via
sqlalchemy
,pandas-gbq
, etc.
- Tools: Power BI, Tableau, Looker
- Python integration: Script execution, data export
- Use cases: KPIs, reports, dashboards
- Tools:
FastAPI
,Flask
, Docker, Kubernetes - Python used to serve models as REST APIs
- Tools: MLflow, DVC, Kubeflow
- Python used for automation and pipeline creation
- Tools: Evidently AI, WhyLabs
- Python used to retrain and monitor models
- Tools:
Great Expectations
,Deequ
- Python used for writing expectations
- Tools: Amundsen, Apache Atlas
- Libraries:
Faker
, custom Python scripts
Tool/Library | Used In | Purpose |
---|---|---|
Python | Everywhere | General scripting, analysis, modeling, deployment |
Pandas | Data Science, ETL, Analytics | Data wrangling, tabular data, EDA |
NumPy | ML, DL, Scientific Computing | Fast numerical computations |
SciPy | Stats, ML, Signal/Image Processing | Advanced scientific computation |
scikit-learn | ML (classification, regression, clustering) | Traditional ML modeling |
TensorFlow | Deep Learning, CV, NLP | Neural networks, large-scale DL |
PyTorch | Deep Learning, Research | Flexibility, academic research, vision/NLP tasks |
Matplotlib | Visualization | Static plots and charts |
Seaborn | Visualization | Statistical plots built on Matplotlib |
Plotly | Interactive Visualization | Dashboards, web-based visual insights |
Airflow | Data Engineering, MLOps | Workflow orchestration (Python DAGs) |
FastAPI | MLOps, API Services | Fast, async REST APIs for ML models |
MLflow/DVC | MLOps, CI/CD | Model tracking, version control |
NLTK/spaCy | NLP | Tokenization, text preprocessing |
OpenCV | Computer Vision | Image preprocessing, detection |
Hugging Face | NLP, Transformers | Pretrained models, pipelines |