YouTube is arguably the world’s largest repository of technical knowledge. Tutorials, conference talks, deep dives, expert interviews - it’s all there. The catch? You have to watch it. And watching takes forever.
We tried the usual approaches:
- Watch at 2x speed (still too slow)
- Skim through manually (miss important details)
- Take notes while watching (exhausting)
- Save videos “for later” (that queue never shrinks)
Then there’s the second problem: even when you do take notes, they end up scattered. A note here about LangGraph, another there about trading patterns, a third about Docker deployment. No connections. No context. Just isolated fragments.
We built something better.
The Solution: YouTube Markdown Agent
We created a complete pipeline that transforms YouTube videos into structured, interconnected Obsidian notes - automatically.
Drop in a URL. Get back a fully-formatted markdown note with:
- YAML frontmatter (title, tags, description, source)
- Key takeaways with timestamped sections
- Automatic links to related notes in your vault
- Curated tags from a hierarchical taxonomy
Multiple videos? Batch them. We processed 1,000+ notes in under an hour.
Figure 1 - YouTube Markdown Agent dashboard showing the main interface with URL input, processing options, and recent conversions
Architecture Overview
The system has four main components working together to transform raw video content into structured, interconnected knowledge.
Figure 2 - System architecture diagram showing the data flow from YouTube URL input through transcript extraction, AI processing, semantic indexing, and final Obsidian output
1. Transcript Extraction
Using yt-dlp, we extract video metadata and transcripts directly from YouTube. No API keys required for public videos. Works with auto-generated captions or manual transcripts.
2. AI Processing
This is where the magic happens. We use Anthropic’s Claude models to transform raw transcripts into structured notes:
- Single videos: Synchronous API for immediate results
- Batch processing (2+ videos): Anthropic Batch API for 50% cost savings
The processing uses a sophisticated multi-turn conversation that maintains context across the entire transcript, producing notes that feel like they were written by someone who actually watched the video.
3. Semantic Indexing
Every note gets embedded and stored in Qdrant (a vector database). This enables:
- Finding related notes by semantic similarity
- Auto-linking new notes to existing content
- Building a true knowledge graph
4. Tag Resolution
We maintain a curated taxonomy of 1,040+ hierarchical tags. When a note is created, suggested tags are matched against this taxonomy, ensuring consistency across the entire vault.
The Numbers
| Metric | Value |
|---|---|
| Notes processed | 1,000+ |
| Auto-generated links | 2,757 |
| Curated tags | 1,040 (from 1,280 chaotic originals) |
| API cost savings | 50% (using Batch API) |
| Processing time (1,000 notes) | ~30 minutes |
| Total cost for vault cleanup | ~$1.50 |
Two Workflows, One System
The application serves two distinct needs:
YouTube to Markdown
Convert video transcripts into complete notes. The AI generates both frontmatter AND body content.
Processing modes:
- Summary - Key points with brief explanations
- Detailed - Comprehensive notes for deep learning
Figure 3 - YouTube processing panel showing URL input field, processing mode selector (Summary/Detailed), and batch queue management
Obsidian Notes Processing
Add metadata to existing downloaded articles. The AI generates frontmatter ONLY, preserving original content.
This is perfect for web articles you’ve saved - add proper tags, titles, and descriptions without losing a single word of the original.
Figure 4 - Obsidian notes processing panel showing file selection, metadata generation options, and preview of generated frontmatter
The Knowledge Graph
The real payoff isn’t individual notes - it’s the connections between them.
Every note is embedded using OpenAI’s text-embedding-3-small model and stored in Qdrant. When you save a new note, the system automatically:
- Finds semantically similar notes (threshold: 0.70)
- Adds bidirectional
[[wiki links]] - Updates the Related Notes section
The result? Your vault becomes a living knowledge graph where related concepts naturally cluster together.
Figure 5 - Knowledge graph visualization showing interconnected notes as nodes with edges representing semantic relationships, clustered by topic
After processing our entire vault, we had 2,757 auto-generated links across 1,024 files. Notes that previously existed in isolation now connect to 3-5 related notes on average.
Tech Stack
This project combines several technologies we’ve battle-tested across other projects:
Backend:
- Python 3.11 with FastAPI
- PostgreSQL for relational data (tags, batches, note metadata)
- Qdrant for vector search (self-hosted)
- Anthropic Claude API (Haiku 3.5 for cost efficiency)
- OpenAI API (embeddings only)
Frontend:
- React 18 + TypeScript + Vite
- Tailwind CSS v4 + shadcn/ui
- React Query for API state
- Dark mode by default (because we have standards)
Infrastructure:
- Self-hosted Qdrant on Proxmox
- PostgreSQL on local network
- Hot-reload development with Vite
Figure 6 - Tech stack overview showing the layered architecture: Frontend (React), API (FastAPI), Services (Claude, Qdrant), and Data (PostgreSQL)
What Makes This Different
There are plenty of “YouTube summarizer” tools out there. Here’s what sets this apart:
1. Integration, Not Isolation
Notes don’t just get created - they get connected. Every note is automatically linked to related content in your vault.
2. Tag Consistency
Instead of random AI-generated tags, we use a curated taxonomy. Every note gets tags that fit into a coherent hierarchy.
3. Cost Efficiency
By using Anthropic’s Batch API for bulk processing, we cut API costs by 50%. Processing 1,000 notes costs about $1.50.
4. Two Processing Modes
Summary mode for quick overviews. Detailed mode for deep learning. Choose based on how much you need to retain.
5. Obsidian-Native
Output is designed specifically for Obsidian: proper YAML frontmatter, wiki-link syntax, callouts, and formatting that just works.
The Development Journey
This started as a simple script and evolved over 12 phases of development:
- Phases 1-4: Backend cleanup and n8n prompt integration
- Phase 6: Modern web UI (FastAPI + React)
- Phase 7: Tag curation system with semantic matching
- Phase 8: Vault batch processing via Anthropic Batch API
- Phase 9: Note similarity and auto-linking
- Phase 10: Fabric pattern integration
- Phase 11: Workflow architecture fixes
- Phase 12: Complete Anthropic API migration
Each phase built on the previous, guided by real usage and actual pain points.
Deep Dives
This article gives you the overview. For the technical details, see:
- Anthropic Batch API in Production - Dual-API architecture for cost and speed
- Building a Semantic Note Network - Vector search and auto-linking
- Obsidian Vault Curation at Scale - Case study: 1,000+ notes cleaned up
- Modern Python Web Stack 2025 - FastAPI + React patterns
What’s Next
The foundation is solid. Coming soon:
Phase 13: RAG Chatbot
Imagine asking your notes questions:
- “What did that video say about RAG architectures?”
- “Which notes mention LangGraph?”
- “Summarize everything I know about trading psychology”
The pieces are already in place - the notes are indexed, the embeddings are stored, the infrastructure is running. We just need to add the chat interface and retrieval logic.
Stay tuned.
This article is part of our series on building AI-powered knowledge management tools. Written with assistance from Claude Code.