Skip to content

A comprehensive Jupyter notebook pipeline for speech-to-text transcription, semantic chunking, and exploratory audio-text analysis.

Notifications You must be signed in to change notification settings

hitesh-ag1/speech2text-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Speech-to-Text Analysis & Semantic Chunking

This Jupyter Notebook explores advanced speech-to-text processing through two major tasks:

🎬 Part 1: Semantic Chunking of YouTube Audio

  • Downloaded and extracted audio from a YouTube video using yt-dlp.
  • Transcribed speech to text using OpenAI Whisper.
  • Performed time-aligned transcription and speaker diarization using PyAnnote.
  • Applied semantic chunking based on sentence structure, speaker turns, and conjunctions for meaningful text segmentation.

πŸ“– Part 2: Exploratory Analysis of New Testament Audio

  • Aligned audio and transcript at the word and phoneme level.
  • Conducted detailed word-level speech analysis, including misalignment, pauses, and anomalies.
  • Detected audio anomalies (like silence, distortion) using waveform and spectral analysis.
  • Performed text bias and linguistic analysis to examine potential content skew.
  • Analyzed audio quality, duration trends, and phoneme usage.

πŸ”§ Tech Stack

  • Python, Jupyter
  • OpenAI Whisper, PyAnnote, SpaCy, NLTK, Librosa, YouTube-DLP
  • Matplotlib, Seaborn for visualization

πŸ“Œ Future Work

  • Automate semantic segmentation for long-form educational videos.
  • Extend phoneme-based alignment to multilingual corpora.

About

A comprehensive Jupyter notebook pipeline for speech-to-text transcription, semantic chunking, and exploratory audio-text analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published