Skip to content
View YanCotta's full-sized avatar
🧬
cooking
🧬
cooking

Highlights

  • Pro

Block or report YanCotta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
YanCotta/README.md


📊 My GitHub Activity

YanCotta

📑 Quick Navigation


🌍 About Me

I'm passionate about designing and deploying fullstack AI systems that are inspired by my unique interdiciplinary background: Biology, Psychology, Philosophy, Computer Science, and AI/ML Architecture and Engineering.

This is not just a job or a career for me; it is the practical application of a lifelong obsession into the nature of intelligent itself, and years of formal academic studies.

My long-term ideal is to utilize my academic background to bridge Biology and AI into solving real-world biological/biomedical problems!

🛠️ Technical Expertise

  • AI Systems Architecture & MLOps: I design scalable, robust, and intelligent systems from concept to deployment. My expertise covers the full end-to-end ML/AI lifecycle, from data science and feature engineering to production-grade MLOps and LLMOps to ensure models deliver real-world value.

  • Full-Stack & Edge AI: I am proficient across the entire tech stack. This includes backend development (Python, C++, FastAPI), cloud infrastructure (AWS), complex databases (PostgreSQL, TimescaleDB, Neo4j), and core ML frameworks (PyTorch, TensorFlow, Scikit-Learn), extending all the way to hardware and Edge AI development with devices like the ESP32.

  • Agentic AI & LLMs: My specialization lies at the frontier of AI. I develop "intelligent workforces" and have deep, hands-on experience in post-training Large Language Models using advanced techniques like SFT and RLHF.

🏆 Proven Impact & Research

  • Leadership & Award-Winning Projects: I thrive on collaboration and have served as a Project Lead within the international SuperDataScience community. I was also the solo architect and engineer of the winning project for FIAP's 2025 Global Solution Challenge.

  • Interdisciplinary Research & Application: My technical work is informed by a deep research background, from a thesis on foundational cognitive science at GSU to applied Bio-AI research into epigenetic anti-aging. This passion for applying AI to scientific challenges continues with my current role as a R&D Intern at Embrapa, focusing on data & animal genomics.

I'm always open to connecting with fellow builders, researchers, and leaders to discuss the future of intelligent systems.


🕰️ Professional Timeline

🏢 Professional Experience

🧬📊 R&D Intern (Data & Genomics) | Embrapa (Dairy Cattle) | Sep 2025 - Present

Key Areas: Animal Genetics Genomic Selection Computational Biology Data Engineering Applied Machine Learning Agricultural Innovation Animal Genetics Genomic Selection Computational Biology Data Engineering Applied Machine Learning & AI Bioinformatics Data Science


🤖 LLMs Trainer (RLHF) | Outlier | Nov 2024 - Sep 2025

Key Areas: RLHF Model Alignment AI Safety Programming Languages Biological Sciences Quality Assurance


🌱 Data Analyst (Ecological Impact) | Impaakt | Feb 2022 - Oct 2024

Key Areas: Environmental Science Sustainability Analysis Data Analysis Process Optimization AI Integration Impact Assessment


📚 Research Assistant | Georgia State University | Feb 2019 - Feb 2020

Key Areas: Cognitive Sciences Philosophy of Mind Psychology Behavioral Analysis Research Methodology Data Analysis Data Science Python


🎓 Academic Background

🤖 AI Systems & Machine Learning Technologist | FIAP | 2024 - 2026 (expected)

Key Areas: AI Systems Architecture Machine Learning Engineering MLOps Edge AI IoT Development Software Engineering Data Engineering Cybersecurity Cloud Operations

Academic Excellence: GPA 4.0


🧬 Bachelor of Biological Sciences | UniAcademia | 2022 - 2025 (in progress)

Key Areas: Molecular Biology Genetics Computational Biology Research Methodology Laboratory Management Scientific Publishing

Academic Excellence: GPA 3.7 | Thesis: Epigenetics Antiaging Health Software Leveraging Machine Learning & Deep Learning Algorithms


🧠 Philosophy (Major) & Psychology (Minor) | Georgia State University | 2017 - 2020 (incomplete)

Key Areas: Cognitive Sciences Philosophy of Mind Psychology Human Behavior Research Methodology Academic Leadership

Academic Excellence: GPA 3.8 | Thesis: Differentiating Factual Belief, Imagination & Religious Credence - A Systematic Theory of Cognitive Attitudes

Additional Recognition: Columnist for "The Signal" (GSU's award-winning newspaper), Atlanta Campus Scholarship recipient, Dean's List, Honor Society member


🚀 My Elephants' Graveyard

Here is where all my projects come to die lol (since they are all for my own learning purposes {I do better with Project Based Learning - PBL}). They demonstrate my skills in building production-grade, scalable, and innovative AI systems from end-to-end across multiple domains.


🤝 Community Projects

These projects were completed as part of the SuperDataScience Data Science international community, where I collaborated with talented data scientists and ML engineers from around the world. I served as Project Lead for 2 projects and as a Project Member for 2 others.

🩺 GlucoTrack: Diabetes Risk Prediction Platform

🎯 Project Lead | Comprehensive diabetes risk assessment system using the CDC diabetes dataset

Led a diverse team of data scientists and ML engineers to deliver both beginner-friendly and advanced deep learning solutions.

🔧 Key Features: Built traditional ML models (Logistic Regression, Decision Trees) and advanced Feedforward Neural Networks with hyperparameter tuning. Includes model explainability tools and multiple deployment options.

💻 Technologies: PythonScikit-learnDeep LearningStreamlitModel ExplainabilityHealthcare AIData Science

Live app: glucotrack.streamlit.app


💰 MLPayGrade: ML Salary Prediction System

🎯 Project Lead | End-to-end salary prediction platform analyzing the 2024 machine learning job market

Coordinated a team of data scientists and ML engineers to build comprehensive solutions across multiple skill levels.

🔧 Key Features: Analyzes global salary trends and job feature impacts on compensation. Features both traditional ML pipelines and advanced deep learning on tabular data with embeddings and explainability.

💻 Technologies: PythonScikit-learnDeep LearningTabular DataStreamlitJob Market AnalyticsData Science


💸 EduSpend: Global Education Cost Prediction

🎯 Project Member | End-to-end machine learning platform to predict Total Cost of Attendance for international higher education

🔧 Key Features: Achieved a 96.44% R² score with an XGBoost Regressor, deployed via both a Streamlit web app and a FastAPI service, all containerized with Docker and automated with CI/CD.

💻 Technologies: Scikit-learnXGBoostMLflowStreamlitFastAPIDockerCI/CDData Science


🌿 Smart Leaf: Deep Learning for Crop Disease

🎯 Project Member | Deep learning solution that classifies 14 different crop diseases across four species

🔧 Key Features: A Convolutional Neural Network (CNN) trained on my local machine, on over 13,000 images, using only modulerized python scripts (no notebooks), deployed via a user-friendly Streamlit interface for real-time predictions. Covers corn, potato, rice, and wheat diseases.

💻 Technologies: Deep LearningComputer VisionCNNTensorFlowPyTorchStreamlitLocally Trained Neural Network


🔧 Full-Stack AI Systems & End-to-End Automation

These are my most comprehensive projects where I architected and built complete AI systems from the ground up, working solo to learn as much as I could, and deliver production-ready solutions that demonstrate my ability to handle complex, full-stack development challenges.

⚙️ Industrial Smart Maintenance SaaS

🎯 Solo Development | Multi-agent AI platform for industrial IoT that predicts machine failures and automates maintenance scheduling

Built entirely from scratch to ensure maximum performance and control.

🔧 Key Features: Custom-built agentic architecture (no frameworks), over 5 ML models tracked by MLFlow and trained on real-world industrial datasets, leverages TimescaleDB for high-performance time-series data, and is fully containerized with multiple Docker microservices.

💻 Technologies: PythonFastAPIPostgreSQLTimescaleDBRedisMLflowDockerStreamlitIoT


🏆 Guardian System: National Resilience Platform (Award Winner)

🎯 Solo Development | My winning project for FIAP's 2025.1 Global Solution Challenge

A visionary multi-agent platform designed to predict and manage large-scale events in Brazil by fusing Agentic AI with concepts from Brazilian folklore.

🔧 Key Features: Five autonomous "Guardian" agents for different threat domains, with a fully functional MVP for fire risk prediction using real-time IoT sensor data.

💻 Technologies: Agentic AIPythonFastAPIDockerMicroPythonESP32IoT


📄 Full-Stack Invoice Automation System

🎯 Solo Development | AI-powered system that automates invoice processing, drastically reducing manual effort

🔧 Key Features: Reduced processing time by over 85% and uses RAG with FAISS for intelligent error classification. Built with multiple frontend (React/Next.js) and deployment options.

💻 Technologies: Next.jsReactTypeScriptAWSLangChainStreamlitRAG


🌱 AgroTech & BioTech Solutions

These projects showcase my work at the intersection of technology and life sciences, developing AI-powered solutions for agriculture, bioinformatics, and environmental monitoring.

🌾 SmartCrops: IoT-ML Agriculture System

🎯 Solo Development | IoT-ML project for smart agriculture featuring dual ESP32 nodes

Features sensor communication via ESP-NOW and gateway connectivity to MQTT/Ubidots for comprehensive crop monitoring.

🔧 Key Features: Real-time collection of temperature, humidity, and soil moisture data. ML model analyzes crop yield predictions and provides real-time plant health classification.

💻 Technologies: PythonC++ESP32IoTMQTTMachine LearningAgriculture AI


🧬 Personalized Anti-Aging Epigenetics ML System

🎯 Solo Development | Thesis project developing a personalized anti-aging recommendation system based on genetics and lifestyle analysis

Analyzes genetic predispositions (SNPs) and lifestyle habits to generate personalized risk assessments and actionable recommendations for healthy aging.

🔧 Key Features: Synthetic genetic data generation with BioPython, model comparison (Random Forest vs Neural Network), explainable AI via SHAP, and MLFlow experiment tracking. Fully containerized with secure JWT authentication.

💻 Technologies: PythonFastAPIReactPyTorchScikit-learnBioPythonMLFlowSHAPDocker


🧬 Bioinformatics & Genetic Analysis Tools

🎯 Solo Development | Collection of high-performance Python tools for bioinformatics

Includes DNA sequence analysis, gene expression analysis, and a pipeline that uses ML to predict disease risk from genetic variants.

🔧 Key Features: Combines population genetics with ML, features ORF detection, PCA for pattern recognition, and robust data processing.

💻 Technologies: PythonBioinformaticsGenomicsPyTorchScikit-learn


🌍 Climate Risk Assessment Tool

🎯 Solo Development | Advanced climate risk prediction system using ensemble machine learning and deep learning

Delivered via a production-ready REST API.

🔧 Key Features: Combines multiple ML models (XGBoost, LSTM) for robust forecasting and integrates real-time weather data for comprehensive analysis. Fully containerized and CI/CD ready.

💻 Technologies: PythonFastAPIEnsemble MLDeep LearningDockerCI/CD


📚 Explore More Projects

... and even more projects in my repositories, covering Data Science, Machine Learning, MLOps, LLMOps, IoT, AI engineering, bioinformatics, and more!

View All Repositories

🛠️ Tech Stack & Tools

AI & Machine Learning

Agentic AI & LLMs

Backend & APIs

Databases & Data Engineering

Cloud & MLOps

Frontend & Visualization

Testing & Code Quality

IoT & Edge AI


🌍 Global Communication


"The most flexible element is the one that controls the system."


Pinned Loading

  1. enterprise_challenge_sprint_1_hermes_reply enterprise_challenge_sprint_1_hermes_reply Public

    A production-grade, open-source SaaS platform for predictive maintenance. This project is built on a resilient and scalable stack including FastAPI, PostgreSQL/TimescaleDB , and Redis, all with Doc…

    Jupyter Notebook 4

  2. global_solution_1_fiap global_solution_1_fiap Public

    Winner of FIAP'S Global Solution 2025.1 Challenge. This repository contains the architecture for a multi-agent system where five autonomous "Guardians" work in synergy to predict, manage, and respo…

    Python 1 1

  3. anti-aging-epigenetics-ml-app anti-aging-epigenetics-ml-app Public

    A thesis MVP for a personalized anti-aging system that analyzes genetic SNPs and lifestyle habits using ML models (Random Forest and Neural Networks) to provide risk assessments and actionable reco…

    Jupyter Notebook 1 1

  4. SmartCrops-IoT-ML-System SmartCrops-IoT-ML-System Public

    An IoT-ML project for smart agriculture: Dual ESP32 nodes (sensor via ESP-NOW, gateway to MQTT/Ubidots) collects temp, humidity, soil moisture data. ML Model analyzes crop yield and real-time plant…

    Jupyter Notebook 2

  5. SDS-CP035-gluco-track SDS-CP035-gluco-track Public

    Forked from SuperDataScience-Community-Projects/SDS-CP035-gluco-track

    GlucoTrack is a machine learning and deep learning project focused on predicting a person’s risk level of diabetes

    Jupyter Notebook 1

  6. agentic_invoice_system_final_version agentic_invoice_system_final_version Public

    Technical test for Brim's AI Engineer role : implementation of a Multi-Agentic System for Invoice Automation. Due 02/28. Nextjs frontend implementation.

    Python 1