minyt
Download audio from YouTube videos and use Gemini LLM for cleaner and smarter transcriptions
minyt (WIP) is a Python package that simplifies the process of downloading YouTube audio and generating high-quality transcripts using Google’s Gemini AI. It intelligently splits long audio files at natural silence points and processes chunks in parallel for optimal performance.
Features
- YouTube Audio Download: Extract audio from any YouTube video using
yt-dlp
- Smart Audio Splitting: Automatically detect silence and split audio at natural break points
- AI-Powered Transcription: Use Google’s Gemini 2.0 Flash for accurate, context-aware transcriptions
- Parallel Processing: Process multiple audio chunks concurrently for faster results
- Customizable: Configure chunk sizes, silence detection, and transcription prompts
- Clean Output: Generate well-formatted transcripts ready for analysis
Quick Start
Installation
pip install minyt
Prerequisites
FFmpeg: Required for audio processing
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt update && sudo apt install ffmpeg # Windows # Download from https://ffmpeg.org/download.html
Google Gemini API Key: Get your API key from Google AI Studio
export GEMINI_API_KEY="your-api-key-here"
Basic Usage
import asyncio
from pathlib import Path
from minyt.core import *
# Download audio from a YouTube video
= "dQw4w9WgXcQ" # Replace with your video ID
video_id = download_audio(video_id, Path("_audio"))
audio_file
# Detect silence and find optimal split points
= detect_silence(audio_file)
_, silence_data = parse_silence_ends(silence_data)
silence_ends = get_audio_duration(audio_file)
total_duration = find_split_points(silence_ends, total_duration, chunk_len=600)
split_points
# Split audio into manageable chunks
= split_audio(audio_file, split_points, dest_dir="_audio_chunks")
chunks
# Transcribe all chunks using Gemini AI
async def main():
= await transcribe_audio(
transcript ="_audio_chunks",
chunks_dir="_transcripts/transcript.txt",
dest_file="Please transcribe this audio file verbatim, maintaining speaker clarity and context."
prompt
)print(f"Transcript saved to: _transcripts/transcript.txt")
asyncio.run(main())
Detailed Usage
Step 1: Download YouTube Audio
from minyt.core import download_audio
from pathlib import Path
# Download audio from a YouTube video
= "your-video-id-here"
video_id = download_audio(video_id, Path("downloads"))
audio_file print(f"Audio downloaded to: {audio_file}")
Step 2: Process Audio with Smart Splitting
from minyt.core import detect_silence, parse_silence_ends, find_split_points, split_audio
# Detect silence in the audio file
= detect_silence(audio_file)
_, silence_data
# Parse silence end points
= parse_silence_ends(silence_data)
silence_ends
# Find optimal split points (aiming for 10-minute chunks)
= get_audio_duration(audio_file)
total_duration = find_split_points(silence_ends, total_duration, chunk_len=600)
split_points
# Split audio into chunks
= split_audio(audio_file, split_points, dest_dir="audio_chunks")
chunks print(f"Created {len(chunks)} audio chunks")
Step 3: Transcribe with Gemini AI
import asyncio
from minyt.core import transcribe_audio
async def transcribe_video():
= await transcribe_audio(
transcript ="audio_chunks",
chunks_dir="transcripts/final_transcript.txt",
dest_file="gemini-2.0-flash-001", # Default model
model=3, # Process 3 chunks simultaneously
max_concurrent="Please transcribe this audio accurately, preserving speaker names and technical terms."
prompt
)return transcript
# Run transcription
= asyncio.run(transcribe_video())
transcript print("Transcription completed!")
Configuration
Environment Variables
# Required
export GEMINI_API_KEY="your-gemini-api-key"
# Optional: Configure logging level
export LOG_LEVEL="INFO"
Customization Options
# Custom silence detection (adjust sensitivity)
= detect_silence(audio_file) # Uses -30dB threshold, 0.5s duration
_, silence_data
# Custom chunk size (in seconds)
= find_split_points(silence_ends, total_duration, chunk_len=300) # 5-minute chunks
split_points
# Custom transcription settings
= await transcribe_audio(
transcript ="chunks",
chunks_dir="output.txt",
dest_file="gemini-2.0-flash-001", # Different Gemini model
model=5, # More parallel processing
max_concurrent="Custom transcription instructions here..."
prompt )
Project Structure
minyt/
├── audio/ # Downloaded audio files
├── audio_chunks/ # Split audio chunks
├── transcripts/ # Generated transcripts
├── minyt/
│ ├── init.py
│ └── core.py # Main functionality
└── nbs/ # Jupyter notebooks (development)
Development
Install in Development Mode
# Clone the repository
git clone https://github.com/franckalbinet/minyt.git
cd minyt
# Install in development mode
pip install -e .
# Make changes in the nbs/ directory
# ...
# Compile changes to apply to minyt package
nbdev_prepare
Dependencies
fastcore
: Core utilitiesgoogle-genai
: Google Gemini AI clientyt-dlp
: YouTube video downloaderffmpeg-python
: Audio processingtqdm
: Progress barsrich
: Enhanced console output
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Acknowledgments
- yt-dlp for YouTube video downloading
- Google Gemini for AI-powered transcription
- FFmpeg for audio processing capabilities
Support
If you encounter any issues or have questions:
- Check the documentation
- Open an issue
- Contact the maintainer: franckalbinet@gmail.com