995 words
5 minutes
From YouTube to Knowledge Graph: Building an AI-Powered Content Pipeline

YouTube is arguably the world’s largest repository of technical knowledge. Tutorials, conference talks, deep dives, expert interviews - it’s all there. The catch? You have to watch it. And watching takes forever.

We tried the usual approaches:

  • Watch at 2x speed (still too slow)
  • Skim through manually (miss important details)
  • Take notes while watching (exhausting)
  • Save videos “for later” (that queue never shrinks)

Then there’s the second problem: even when you do take notes, they end up scattered. A note here about LangGraph, another there about trading patterns, a third about Docker deployment. No connections. No context. Just isolated fragments.

We built something better.


The Solution: YouTube Markdown Agent#

We created a complete pipeline that transforms YouTube videos into structured, interconnected Obsidian notes - automatically.

Drop in a URL. Get back a fully-formatted markdown note with:

  • YAML frontmatter (title, tags, description, source)
  • Key takeaways with timestamped sections
  • Automatic links to related notes in your vault
  • Curated tags from a hierarchical taxonomy

Multiple videos? Batch them. We processed 1,000+ notes in under an hour.

Figure 1 - YouTube Markdown Agent dashboard showing the main interface with URL input, processing options, and recent conversions


Architecture Overview#

The system has four main components working together to transform raw video content into structured, interconnected knowledge.

Figure 2 - System architecture diagram showing the data flow from YouTube URL input through transcript extraction, AI processing, semantic indexing, and final Obsidian output

1. Transcript Extraction#

Using yt-dlp, we extract video metadata and transcripts directly from YouTube. No API keys required for public videos. Works with auto-generated captions or manual transcripts.

2. AI Processing#

This is where the magic happens. We use Anthropic’s Claude models to transform raw transcripts into structured notes:

  • Single videos: Synchronous API for immediate results
  • Batch processing (2+ videos): Anthropic Batch API for 50% cost savings

The processing uses a sophisticated multi-turn conversation that maintains context across the entire transcript, producing notes that feel like they were written by someone who actually watched the video.

3. Semantic Indexing#

Every note gets embedded and stored in Qdrant (a vector database). This enables:

  • Finding related notes by semantic similarity
  • Auto-linking new notes to existing content
  • Building a true knowledge graph

4. Tag Resolution#

We maintain a curated taxonomy of 1,040+ hierarchical tags. When a note is created, suggested tags are matched against this taxonomy, ensuring consistency across the entire vault.


The Numbers#

MetricValue
Notes processed1,000+
Auto-generated links2,757
Curated tags1,040 (from 1,280 chaotic originals)
API cost savings50% (using Batch API)
Processing time (1,000 notes)~30 minutes
Total cost for vault cleanup~$1.50

Two Workflows, One System#

The application serves two distinct needs:

YouTube to Markdown#

Convert video transcripts into complete notes. The AI generates both frontmatter AND body content.

Processing modes:

  • Summary - Key points with brief explanations
  • Detailed - Comprehensive notes for deep learning

Figure 3 - YouTube processing panel showing URL input field, processing mode selector (Summary/Detailed), and batch queue management

Obsidian Notes Processing#

Add metadata to existing downloaded articles. The AI generates frontmatter ONLY, preserving original content.

This is perfect for web articles you’ve saved - add proper tags, titles, and descriptions without losing a single word of the original.

Figure 4 - Obsidian notes processing panel showing file selection, metadata generation options, and preview of generated frontmatter


The Knowledge Graph#

The real payoff isn’t individual notes - it’s the connections between them.

Every note is embedded using OpenAI’s text-embedding-3-small model and stored in Qdrant. When you save a new note, the system automatically:

  1. Finds semantically similar notes (threshold: 0.70)
  2. Adds bidirectional [[wiki links]]
  3. Updates the Related Notes section

The result? Your vault becomes a living knowledge graph where related concepts naturally cluster together.

Figure 5 - Knowledge graph visualization showing interconnected notes as nodes with edges representing semantic relationships, clustered by topic

After processing our entire vault, we had 2,757 auto-generated links across 1,024 files. Notes that previously existed in isolation now connect to 3-5 related notes on average.


Tech Stack#

This project combines several technologies we’ve battle-tested across other projects:

Backend:

  • Python 3.11 with FastAPI
  • PostgreSQL for relational data (tags, batches, note metadata)
  • Qdrant for vector search (self-hosted)
  • Anthropic Claude API (Haiku 3.5 for cost efficiency)
  • OpenAI API (embeddings only)

Frontend:

  • React 18 + TypeScript + Vite
  • Tailwind CSS v4 + shadcn/ui
  • React Query for API state
  • Dark mode by default (because we have standards)

Infrastructure:

  • Self-hosted Qdrant on Proxmox
  • PostgreSQL on local network
  • Hot-reload development with Vite

Figure 6 - Tech stack overview showing the layered architecture: Frontend (React), API (FastAPI), Services (Claude, Qdrant), and Data (PostgreSQL)


What Makes This Different#

There are plenty of “YouTube summarizer” tools out there. Here’s what sets this apart:

1. Integration, Not Isolation#

Notes don’t just get created - they get connected. Every note is automatically linked to related content in your vault.

2. Tag Consistency#

Instead of random AI-generated tags, we use a curated taxonomy. Every note gets tags that fit into a coherent hierarchy.

3. Cost Efficiency#

By using Anthropic’s Batch API for bulk processing, we cut API costs by 50%. Processing 1,000 notes costs about $1.50.

4. Two Processing Modes#

Summary mode for quick overviews. Detailed mode for deep learning. Choose based on how much you need to retain.

5. Obsidian-Native#

Output is designed specifically for Obsidian: proper YAML frontmatter, wiki-link syntax, callouts, and formatting that just works.


The Development Journey#

This started as a simple script and evolved over 12 phases of development:

  1. Phases 1-4: Backend cleanup and n8n prompt integration
  2. Phase 6: Modern web UI (FastAPI + React)
  3. Phase 7: Tag curation system with semantic matching
  4. Phase 8: Vault batch processing via Anthropic Batch API
  5. Phase 9: Note similarity and auto-linking
  6. Phase 10: Fabric pattern integration
  7. Phase 11: Workflow architecture fixes
  8. Phase 12: Complete Anthropic API migration

Each phase built on the previous, guided by real usage and actual pain points.


Deep Dives#

This article gives you the overview. For the technical details, see:


What’s Next#

The foundation is solid. Coming soon:

Phase 13: RAG Chatbot

Imagine asking your notes questions:

  • “What did that video say about RAG architectures?”
  • “Which notes mention LangGraph?”
  • “Summarize everything I know about trading psychology”

The pieces are already in place - the notes are indexed, the embeddings are stored, the infrastructure is running. We just need to add the chat interface and retrieval logic.

Stay tuned.


This article is part of our series on building AI-powered knowledge management tools. Written with assistance from Claude Code.

From YouTube to Knowledge Graph: Building an AI-Powered Content Pipeline
https://fuwari.vercel.app/articles/youtube-to-knowledge-graph/
Author
Katrina Dotzlaw, Ryan Dotzlaw
Published at
2025-12-15
License
CC BY-NC-SA 4.0
← Back to Articles