Skip to content

A comprehensive reference guide mapping the entire AI, Machine Learning, Data Science, and Data Engineering ecosystem—with categorized tools, libraries, workflows, and Python usage.

Notifications You must be signed in to change notification settings

akashmi/ai-data-engineering-ecosystem-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

ai-data-engineering-ecosystem-guide

A comprehensive reference guide mapping the entire AI, Machine Learning, Data Science, and Data Engineering ecosystem—with categorized tools, libraries, workflows, and Python usage.

AI_Data_Hierarchy_with_Tools

Full Hierarchy of AI, Data Science, Data Engineering (with Tools and Libraries)

1. Artificial Intelligence (AI)

1.1 Machine Learning (ML)

  • Supervised Learning
    • Regression, Classification
    • Libraries: scikit-learn, XGBoost, LightGBM
  • Unsupervised Learning
    • Clustering, PCA
    • Libraries: scikit-learn, NumPy, SciPy
  • Reinforcement Learning (RL)
    • Q-Learning, DQN
    • Libraries: OpenAI Gym, Stable-Baselines3
  • Deep Learning (DL)
    • CNN, RNN, Transformers
    • Libraries: TensorFlow, PyTorch, Keras
    • Python used for: Model building, training, deployment

1.2 Natural Language Processing (NLP)

  • Tasks: Sentiment Analysis, NER, Translation
  • Libraries: NLTK, spaCy, Hugging Face Transformers, Gensim
  • Python used for: Tokenization, text preprocessing, model training

1.3 Computer Vision (CV)

  • Tasks: Image Classification, Object Detection, Segmentation
  • Libraries: OpenCV, PIL, PyTorch, TensorFlow, Detectron2
  • Python used for: Image preprocessing, feature extraction, modeling

2. Data Science

2.1 Data Analytics

  • Descriptive, Diagnostic, Predictive, Prescriptive
  • Libraries: Pandas, NumPy
  • Tools: Jupyter, SQL, Python
  • Python used for: EDA, statistics, modeling

2.2 Data Visualization

  • Libraries: Matplotlib, Seaborn, Plotly, Altair
  • Tools: Tableau, Power BI
  • Python used for: Plots, charts, dashboards

2.3 Machine Learning (Overlap with AI)

  • Tools: scikit-learn, pandas, NumPy
  • Python used throughout ML lifecycle

3. Data Engineering

3.1 ETL / ELT

  • Tools: Apache NiFi, Talend, Informatica, dbt
  • Python used with: Pandas, PySpark

3.2 Data Pipelines

  • Batch: Apache Spark, AWS Glue
  • Stream: Apache Kafka, Apache Flink
  • Python used for: Transformation logic, UDFs

3.3 Workflow Orchestration

  • Tools: Apache Airflow, Prefect, Luigi
  • Python used to define DAGs and scheduling

3.4 Data Warehousing

  • Tools: BigQuery, Redshift, Snowflake
  • Python used to connect via sqlalchemy, pandas-gbq, etc.

4. Business Intelligence (BI)

  • Tools: Power BI, Tableau, Looker
  • Python integration: Script execution, data export
  • Use cases: KPIs, reports, dashboards

5. MLOps (Machine Learning Operations)

5.1 Model Deployment

  • Tools: FastAPI, Flask, Docker, Kubernetes
  • Python used to serve models as REST APIs

5.2 CI/CD for ML

  • Tools: MLflow, DVC, Kubeflow
  • Python used for automation and pipeline creation

5.3 Monitoring & Retraining

  • Tools: Evidently AI, WhyLabs
  • Python used to retrain and monitor models

6. Data Governance & Compliance

6.1 Data Quality

  • Tools: Great Expectations, Deequ
  • Python used for writing expectations

6.2 Metadata Management

  • Tools: Amundsen, Apache Atlas

6.3 Privacy & Compliance

  • Libraries: Faker, custom Python scripts

🔁 Python & Libraries Usage Summary

Tool/Library Used In Purpose
Python Everywhere General scripting, analysis, modeling, deployment
Pandas Data Science, ETL, Analytics Data wrangling, tabular data, EDA
NumPy ML, DL, Scientific Computing Fast numerical computations
SciPy Stats, ML, Signal/Image Processing Advanced scientific computation
scikit-learn ML (classification, regression, clustering) Traditional ML modeling
TensorFlow Deep Learning, CV, NLP Neural networks, large-scale DL
PyTorch Deep Learning, Research Flexibility, academic research, vision/NLP tasks
Matplotlib Visualization Static plots and charts
Seaborn Visualization Statistical plots built on Matplotlib
Plotly Interactive Visualization Dashboards, web-based visual insights
Airflow Data Engineering, MLOps Workflow orchestration (Python DAGs)
FastAPI MLOps, API Services Fast, async REST APIs for ML models
MLflow/DVC MLOps, CI/CD Model tracking, version control
NLTK/spaCy NLP Tokenization, text preprocessing
OpenCV Computer Vision Image preprocessing, detection
Hugging Face NLP, Transformers Pretrained models, pipelines

About

A comprehensive reference guide mapping the entire AI, Machine Learning, Data Science, and Data Engineering ecosystem—with categorized tools, libraries, workflows, and Python usage.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published