Tuesday, January 20, 2026

Like X-Ray Vision For YouTube Content

While browsing the YouTube API, I came across a section on captions and it hit me: if I could slurp down a video's subtitles, I could feed them to an LLM like Gemini and get insights about the video without ever pressing play. I wondered if I could get answers to questions like: what's novel about this video? (besides a clever title and thumbnail) or does this video show signs of bias?

Digging into this, I quickly learned that the captions endpoint doesn't help solve this problem. It's designed for content creators to download the captions that they've created for their own videos. I wanted a way to grab subtitles for public videos.

But all isn't lost: with additional research, I discovered that you can use yt-dlp to grab a video's' subtitles. yt-dlp is a YouTube command line downloader, and one of its bonus features is downloading caption files. With this tool installed, my idea was trivial to put together.

First off, I added a -a subtitles option to youtubeassist, my command line tool for working with YouTube. I can now quickly grab the captions for any video I want. Here's this functionality in action:

  $ youtubeassist -a subtitles -i https://www.youtube.com/watch?v=dQw4w9WgXcQ | grep ^Never | head -4
  Never<00:00:43.440><c> going</c><00:00:43.520><c> to</c><00:00:43.760><c> give</c><00:00:44.079><c> you</c><00:00:44.480><c> up.</c><00:00:45.360><c> I'm</c><00:00:45.520><c> going</c><00:00:45.600><c> to</c>
  Never going to give you up. I'm going to
  Never going to give you up. I'm going to

Next, I tweaked geminiassist, my command line tool for working with Gemini. I added a -y video-id-or-url option. When -y is provided, the corresponding caption data is downloaded and included as an attachment with the prompt that's sent to Gemini. I was amazed how quickly all this came together.

Consider this video: This Journalling System Solved my Shower Thought Problem. Is it worth my time, or just more of the same? Now I can ask Gemini.

  $ geminiassist -a ask -y ABSV0GzSnS0 -p "Can you please look over the subtitles of this video. Can you summarize the video in a few bullet points. Give me a few bullet points why I should watch the video and a few bullet points why I should skip it. Generate results in HTML so I can embed them in a blog post."

Video Summary

  • The video outlines the creator's journaling system for turning ideas into reality, emphasizing the importance of tools and systems that reduce friction and encourage consistent creative action.
  • The creator recounts how losing an initial idea (a toothbrush with a flossing compartment) spurred the development of his current system.
  • Key components include a pocket journal (Moleskine) for capturing on-the-go ideas and a handmade pencil case filled with useful tools (markers, scissors, tape, ruler, etc.).
  • Digital tools are also key. He has a dedicated notebook, and mini printer to integrate photos and images into his journals, and a digital app (Notion) for organizing and refining ideas and creating a production pipeline.
  • The system prioritizes intentionality and visual representation over convenience, aiming for a harmonious process between idea and outcome.

Why You Should Watch This Video

  • Inspiration for creative systems: The video offers a detailed look into a creative professional's workflow.
  • Practical tool recommendations: Discover specific tools and applications (e.g., Moleskine, Jetream, Notion, mini printer) that enhance the journaling and creative process.
  • Emphasis on consistency: Learn the importance of building a routine around creative work, even if it's not always convenient.
  • Visual learning: The creator's emphasis on visual journaling is beneficial for visual learners and anyone seeking alternative methods of idea capture and development.

Why You Might Skip This Video

  • Not focused on digital tools only:The video has a large focus on a more hand-written and craft styled system. If you prefer pure digital solutions, you might find the physical tools section less relevant.
  • Individualized System: This is _one_ system, it does not mean that it will work for you and may require extra effort and experimentation to get your own system up.
  • Detailed Equipment List: The video goes into specific detail about all the tools which can be tedious to hear about if you're not interested in purchasing some of the tools the creator uses.
  • Personal Preference: The creator is very passionate about his current system, while passion can be great, the heavy emphasis on his personal system may not translate to value for all viewers.

Or, here's a provocative title that I'd normally skip as clickbait: America Just Entered Early-Stage Revolution. But now I can ask Gemini for its opinion before I waste my time and muddy up my feed.

  $ geminiassist -a ask -y '-7sDn_B2TbE' -p "Please review the subtitles for this video. Can you summarize it? Does the video show signs of bias? Is this clickbait or reasoned analysis? Provide your response in HTML bullet form so I can embed this easily in a blog post."
  • Summary: The video analyzes an incident in Minneapolis where Jake Lang, a known January 6th rioter and right-wing agitator, encountered resistance at a protest and then falsely claimed to be stabbed. The video argues that Lang's experience signifies a shift in power dynamics, where right-wing intimidation tactics are becoming less effective as people are physically pushing back against bigotry. It contrasts this with the Charlottesville events, when that intimidation was much more prevalent. The video concludes with cautious optimism, suggesting this could be the beginning of a real resistance movement, but warns that sustained, national, and habitual action is needed to bring on the revolution.
  • Signs of Bias: The video exhibits a clear bias against Jake Lang and the right-wing movement. The language used to describe Lang ("clown," "professional agitator," "racist white grievance movement") is strongly negative. The video frames the confrontation as a positive instance of physical resistance against "fascism". There are other instances of loaded language, such as calling Trump supporters "fascist" and "toxic."
  • Clickbait vs. Reasoned Analysis: This leans more towards reasoned analysis than clickbait, though elements of both are present. The video presents a specific argument, supports it with examples and comparisons (Minneapolis vs. Charlottesville), and draws a conclusion about broader social trends. However, the framing of the title and the language used likely aim to attract a specific audience and provoke an emotional response.

Like all things AI LLM, with an improved prompt, I could get improved responses. Still, even these hastily prepared queries show promise.

I'm not sure how I'll integrate this into my YouTube workflow, but I'm definitely psyched to add this tool to my toolbox.

No comments:

Post a Comment