Melaku Garsamo mgarsamo

Hi there 👋, I am Melaku!

🔬 Melaku Garsamo

Protein Engineer • Machine Learning Scientist • Computational Biologist
Creator of EmbedDiff-ESM 🧬 and EmbedDiff-Dayhoff 🔄

👋 About Me

🔬 I’m a hybrid protein engineer and ML scientist with deep experience in both wet-lab experimentation and machine learning for protein design.
I bridge experimental biochemistry with generative AI, building next-gen tools to accelerate biologics discovery.

🧠 Currently developing: EmbedDiff-ESM (ESM-2 backbone) and EmbedDiff-Dayhoff (Dayhoff ablation) — exploring how protein LMs affect generative design
🧬 Passionate about generative AI in biotech & synthetic biology
🧪 Experienced in sequence modeling, folding, and structure–function pipelines

🔧 Skills & Tools

Languages: Python, PyTorch, R, SQL, Bash
Tools: Git, Docker, VS Code, Conda, Jupyter, SnapGene, PyMOL, Prism, ELN, Tableau
ML/BioAI: ESM-2, Dayhoff Atlas, AlphaFold, Transformers, Diffusion Models, t-SNE, BLAST

🧪 Wet Lab Expertise

Enzyme characterization: Km, Vmax, kcat
Thermal stability: Prometheus Panta, residual activity assays
Protein visualization: SDS-PAGE, Western blot
Molecular biology: PCR, qPCR, SDM, Golden Gate, high-throughput cloning
PPIs: FRET assays
Automation: Tecan, Echo, LabChip, ZAG
Purification: SEC-MALS, IEX, affinity (FPLC/ÄKTA)
Biophysics: DLS, BLI (Octet® RH96), FT-IR, TGA
Quantification: MS, analytical SEC (HPLC)
Microscopy: confocal, SEM, EDS
Crystallization & genotype screening, Agrobacterium methods

🚀 Featured Projects

📦 EmbedDiff-ESM + 🔄 EmbedDiff-Dayhoff

Complementary pipelines for de novo protein design with diffusion models, probing how ESM-2 vs Microsoft Dayhoff-3B shape generative outcomes.

👉 EmbedDiff-ESM report
👉 EmbedDiff-Dayhoff report

📑 Comparative Benchmark: EmbedDiff-ESM2 vs EmbedDiff-Dayhoff

I developed and compared two parallel latent diffusion pipelines for de novo protein design, each conditioned on a different pretrained embedding backbone: EmbedDiff-ESM2, which leverages Meta’s ESM-2 protein language model trained at evolutionary scale, and EmbedDiff-Dayhoff, which uses Microsoft’s Dayhoff-3B model trained on clustered UniRef with substitution-aware geometry. Both pipelines share the same workflow—embedding natural protein sequences into latent space, training a denoising diffusion model to learn biologically meaningful manifolds, and decoding embeddings into amino acid sequences with a Transformer-based decoder—followed by rigorous multi-metric evaluation. Unlike traditional structure-based or template-driven design approaches, EmbedDiff explores protein sequence space without structural supervision, enabling us to test how different embedding backbones influence novelty, plausibility, and functional diversity. To benchmark generated sequences, I combined perplexity scoring with ProtT5, t-SNE domain clustering, logistic regression probes, entropy vs identity trade-offs, cosine similarity distributions, and domain overlays, providing a holistic view of backbone performance. Our results show that both models produce very high-perplexity sequences, confirming that diffusion pushes into novel sequence space beyond the immediate training manifold, while global plausibility remains comparable between ESM-2 and Dayhoff. At the local level, however, differences emerge: ESM-2 tends to generate more conservative, higher-identity outputs that preserve natural priors, whereas Dayhoff explores higher-entropy, more divergent solutions. Together, these findings demonstrate that embedding choice directly steers generative exploration of protein space, with ESM-2 offering stability and conservation, and Dayhoff driving evolutionary exploration—two complementary strategies for advancing generative protein engineering.

🧭 Domain-Colored t-SNE (Overview)

Takeaway: ESM-2 embeddings produce slightly tighter domain separation, while Dayhoff preserves broader evolutionary diversity in latent space.

ESM-2	Dayhoff

✅ Logistic Regression Backbone Check (Classification Sanity)

Takeaway: Both backbones retain strong class separability, validating that embeddings encode sufficient biological signal for downstream classifiers.

ESM-2	Dayhoff

🔬 Latent Diffusion Training — Cross-Entropy Loss

Takeaway: Training dynamics are comparable across backbones, with both models converging steadily under diffusion noise scheduling.

ESM-2	Dayhoff

🧮 Entropy vs Sequence Identity

Takeaway: Both backbones show a comparable global entropy–identity distribution.

ESM-2	Dayhoff

📊 Identity & Similarity Distributions

Takeaway: Identity and cosine similarity histograms reveal overlapping regimes for both ESM-2 and Dayhoff.

ESM-2 — Identity	Dayhoff — Identity

ESM-2 — All cosine histograms	Dayhoff — All cosine histograms

🧩 Domain Overlay — Real vs Generated (t-SNE)

Takeaway: Generated sequences cluster near real domains but backbone choice shifts how tightly generated points adhere to natural evolutionary space.

ESM-2	Dayhoff

🧮 Perplexity (ESM-2 vs Dayhoff Results)

Takeaway: Despite absolute perplexity being high for both, distributions overlap strongly—suggesting backbone choice does not dramatically alter global plausibility.

ESM-2 vs Dayhoff

These side-by-side comparisons reveal how the embedding backbone steers generative design — domain separation, entropy/identity trade-offs, and similarity structure all shift with the latent geometry learned by ESM-2 vs Dayhoff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly