From YouTube to Knowledge Graph: Building an AI-Powered Content Pipeline

995 words

5 minutes

From YouTube to Knowledge Graph: Building an AI-Powered Content Pipeline

2025-12-15

AI & Automation

ai

/

knowledge-management

/

obsidian

/

python

/

anthropic

YouTube is arguably the world’s largest repository of technical knowledge. Tutorials, conference talks, deep dives, expert interviews - it’s all there. The catch? You have to watch it. And watching takes forever.

We tried the usual approaches:

Watch at 2x speed (still too slow)
Skim through manually (miss important details)
Take notes while watching (exhausting)
Save videos “for later” (that queue never shrinks)

Then there’s the second problem: even when you do take notes, they end up scattered. A note here about LangGraph, another there about trading patterns, a third about Docker deployment. No connections. No context. Just isolated fragments.

We built something better.

The Solution: YouTube Markdown Agent#

We created a complete pipeline that transforms YouTube videos into structured, interconnected Obsidian notes - automatically.

Drop in a URL. Get back a fully-formatted markdown note with:

YAML frontmatter (title, tags, description, source)
Key takeaways with timestamped sections
Automatic links to related notes in your vault
Curated tags from a hierarchical taxonomy

Multiple videos? Batch them. We processed 1,000+ notes in under an hour.

Figure 1 - YouTube Markdown Agent dashboard showing the main interface with URL input, processing options, and recent conversions

Architecture Overview#

The system has four main components working together to transform raw video content into structured, interconnected knowledge.

Figure 2 - System architecture diagram showing the data flow from YouTube URL input through transcript extraction, AI processing, semantic indexing, and final Obsidian output

1. Transcript Extraction#

Using yt-dlp, we extract video metadata and transcripts directly from YouTube. No API keys required for public videos. Works with auto-generated captions or manual transcripts.

2. AI Processing#

This is where the magic happens. We use Anthropic’s Claude models to transform raw transcripts into structured notes:

Single videos: Synchronous API for immediate results
Batch processing (2+ videos): Anthropic Batch API for 50% cost savings

The processing uses a sophisticated multi-turn conversation that maintains context across the entire transcript, producing notes that feel like they were written by someone who actually watched the video.

3. Semantic Indexing#

Every note gets embedded and stored in Qdrant (a vector database). This enables:

Finding related notes by semantic similarity
Auto-linking new notes to existing content
Building a true knowledge graph

4. Tag Resolution#

We maintain a curated taxonomy of 1,040+ hierarchical tags. When a note is created, suggested tags are matched against this taxonomy, ensuring consistency across the entire vault.

The Numbers#

Metric	Value
Notes processed	1,000+
Auto-generated links	2,757
Curated tags	1,040 (from 1,280 chaotic originals)
API cost savings	50% (using Batch API)
Processing time (1,000 notes)	~30 minutes
Total cost for vault cleanup	~$1.50

Two Workflows, One System#

The application serves two distinct needs:

YouTube to Markdown#

Convert video transcripts into complete notes. The AI generates both frontmatter AND body content.

Processing modes:

Summary - Key points with brief explanations
Detailed - Comprehensive notes for deep learning

Figure 3 - YouTube processing panel showing URL input field, processing mode selector (Summary/Detailed), and batch queue management

Obsidian Notes Processing#

Add metadata to existing downloaded articles. The AI generates frontmatter ONLY, preserving original content.

This is perfect for web articles you’ve saved - add proper tags, titles, and descriptions without losing a single word of the original.

Figure 4 - Obsidian notes processing panel showing file selection, metadata generation options, and preview of generated frontmatter

The Knowledge Graph#

The real payoff isn’t individual notes - it’s the connections between them.

Every note is embedded using OpenAI’s text-embedding-3-small model and stored in Qdrant. When you save a new note, the system automatically:

Finds semantically similar notes (threshold: 0.70)
Adds bidirectional [[wiki links]]
Updates the Related Notes section

The result? Your vault becomes a living knowledge graph where related concepts naturally cluster together.

Figure 5 - Knowledge graph visualization showing interconnected notes as nodes with edges representing semantic relationships, clustered by topic

After processing our entire vault, we had 2,757 auto-generated links across 1,024 files. Notes that previously existed in isolation now connect to 3-5 related notes on average.

Tech Stack#

This project combines several technologies we’ve battle-tested across other projects:

Backend:

Python 3.11 with FastAPI
PostgreSQL for relational data (tags, batches, note metadata)
Qdrant for vector search (self-hosted)
Anthropic Claude API (Haiku 3.5 for cost efficiency)
OpenAI API (embeddings only)

Frontend:

React 18 + TypeScript + Vite
Tailwind CSS v4 + shadcn/ui
React Query for API state
Dark mode by default (because we have standards)

Infrastructure:

Self-hosted Qdrant on Proxmox
PostgreSQL on local network
Hot-reload development with Vite

Figure 6 - Tech stack overview showing the layered architecture: Frontend (React), API (FastAPI), Services (Claude, Qdrant), and Data (PostgreSQL)

What Makes This Different#

There are plenty of “YouTube summarizer” tools out there. Here’s what sets this apart:

1. Integration, Not Isolation#

Notes don’t just get created - they get connected. Every note is automatically linked to related content in your vault.

2. Tag Consistency#

Instead of random AI-generated tags, we use a curated taxonomy. Every note gets tags that fit into a coherent hierarchy.

3. Cost Efficiency#

By using Anthropic’s Batch API for bulk processing, we cut API costs by 50%. Processing 1,000 notes costs about $1.50.

4. Two Processing Modes#

Summary mode for quick overviews. Detailed mode for deep learning. Choose based on how much you need to retain.

5. Obsidian-Native#

Output is designed specifically for Obsidian: proper YAML frontmatter, wiki-link syntax, callouts, and formatting that just works.

The Development Journey#

This started as a simple script and evolved over 12 phases of development:

Phases 1-4: Backend cleanup and n8n prompt integration
Phase 6: Modern web UI (FastAPI + React)
Phase 7: Tag curation system with semantic matching
Phase 8: Vault batch processing via Anthropic Batch API
Phase 9: Note similarity and auto-linking
Phase 10: Fabric pattern integration
Phase 11: Workflow architecture fixes
Phase 12: Complete Anthropic API migration

Each phase built on the previous, guided by real usage and actual pain points.

Deep Dives#

This article gives you the overview. For the technical details, see:

Anthropic Batch API in Production - Dual-API architecture for cost and speed
Building a Semantic Note Network - Vector search and auto-linking
Obsidian Vault Curation at Scale - Case study: 1,000+ notes cleaned up
Modern Python Web Stack 2025 - FastAPI + React patterns

What’s Next#

The foundation is solid. Coming soon:

Phase 13: RAG Chatbot

Imagine asking your notes questions:

“What did that video say about RAG architectures?”
“Which notes mention LangGraph?”
“Summarize everything I know about trading psychology”

The pieces are already in place - the notes are indexed, the embeddings are stored, the infrastructure is running. We just need to add the chat interface and retrieval logic.

Stay tuned.

This article is part of our series on building AI-powered knowledge management tools. Written with assistance from Claude Code.