Tuesday, December 01, 2020

Auto Packaging of Videos | ffmpeg To The Rescue

The challenge: find a way to programmatically add a title screen, watermark, and credit screen to a video. My first thought was to turn to ffmpeg; it's gotten me out of every other video processing jam before, surely it wouldn't fail me now.

My initial plan was to generate content using ImageMagick and then use ffmpeg to combine the various images and source video into a finished product. However, the more I learned about ffmpeg's filters the more I realized that I could do the content generation directly in ffmpeg.

To do this, I needed to grok two ffmpeg concepts. First, ffmpeg's complex_filter option allows chaining filters together to create interesting effects using a technique reminiscent of Unix pipes. One source of confusion: ffmpeg allows for writing the same expression in various ways, from verbose to cryptically terse. Consider these examples:

ffmpeg -i input -vf [in]yadif=0:0:0[middle];[middle]scale=iw/2:-1[out] output # 2 chains form, one filter per chain, chains linked by the [middle] pad
ffmpeg -i input -vf [in]yadif=0:0:0,scale=iw/2:-1[out] output                 # 1 chain form, with 2 filters in the chain, linking implied
ffmpeg -i input -vf yadif=0:0:0,scale=iw/2:-1  output                         # the input and output are implied without ambiguity

These all do the same thing, combining the yadif and scale filter. The first one explicitly names each stream ([in], [middle] and [out]) while the last example relies on implicit naming and behavior. This tersification extends to the filter definitions as well. These are equivalent expressions:

  scale=width=100:height=50
  scale=w=100:h=50
  scale=100:50

This ability to compose expressions with varying degrees of verbosity means that many one-liners I found on the web were initially hard to understand and even harder to imagine how they could be combined. Once I started thinking of filters and their arguments as chains of streams, and working with them using verbose notation, the problem was vastly simplified.

The other concept I needed to wrap my head around was that before I could filter a stream, I needed to have a corresponding input. In some cases, the input was obvious. For example, adding a watermark image and text to a video stream was relatively straightforward. One solution has the following shape:

  ffmpeg -i source.mp4 -i logo.png  \           [1] These are my inputs, 0 and 1 respectively
         -filter_complex "\
           [0:v][1:v]  overlay=.... [main]; \   [2] Overlay the first two streams, write the output to [main]
           [main] drawtext=... [main] \         [3] Add text to the main stream
         \" output.mp4

But what about adding a title screen before the main video? drawtext and overlay would let me add text and images to the stream, but what's the source of the stream in the first place? One solution: generate a stream using the oddly named virtual device lavfi. With this in mind, adding a title screen looks roughly like so:

  ffmpeg -i source.mp4 \        [1] Main video
         -f lavfi \
         -i "color=color=0xf2e4f2:\ [2] Generated input source
            duration=5:\                for our title screen
            size=1024x1024:\
            rate=30" \
         -filter_complex "\
          [0:]v ... [main]             [3] Do something with the source video and send it to [main]
          [1:v] drawtext=...[pre]\     [4] Draw on the virtual screen, send it to 'pre'
          [pre][main] concat [final]\  [5] Combined our [pre] and [main] streams for a final result
         \" output.mp4
        

With a solid'ish grasp of filters and lavfi generated streams, I was ready to tackle the original problem. I created a shell script for packaging video and the final result was looking acceptable. But then I ran into another issue: what's the best way to parameterize the script?

If I passed in all the arguments on the command line, about 40 in total, using the script would be a pain. If I stored all the options in a config file, then scripting the command would become tricky. With a bit of reflection, I realized there was a low effort, high value solution to this problem. At the top of the script, I define sane defaults for all the variables. I then process command line arguments like so:

config=$HOME/.pkgvid.last
cp /dev/null $config

while [ -n "$1" ] ; do
  arg=$1 ; shift
  if [ -f "$arg" ] ; then                [1]
    echo "# $arg" >> $config
    cat $arg >> $config
  fi
  
  name=$(echo $arg | cut -d= -f1)
  if [ "$name" != "$arg" ]; then         [2]
     echo $arg >> $config
  fi

  echo >> $config
done

. $config  [3]

For each command line argument, I first check if the argument is an existing file. If it is, I add the contents of that file to the end of ~/.pkgvid.last. I then check if the argument has the shape variable=value, if it does, then I add this expression to the end of ~/.pkgvid.last. Once I've processed all the command line arguments, magic happens at [3] by sourcing ~/.pkgvid.last. This reads the variable definitions that were defined in config files and on the command line, and has them override defaults.

This sounds confusing, but in practice, it's delightful to use. Consider blog.config which has been setup to override defaults for blog related videos:

main_video=$base_dir/water.short.mp4
main_caption_text="Ben Simon"

pre_title_text="A BlogByBen Video"
pre_bg_color=0x1d518a
pre_title_color=white
pre_image_w=250

main_caption_color=0x1d518a
main_image_w=75

post_title_text="blogbyben.com"
post_bg_color=$pre_bg_color
post_title_color=$pre_title_color

ffmpeg=/usr/local/bin/ffmpeg
logo=$base_dir/ben_logo.jpg

I can then override these settings by using command line options. For example:

 for part in $(seq 1 10); do
   pkg.sh blog.config pre_title_text="Blogging Secrets. Part $part of 10"
 done

This strategy lets me organize variables into a config file and judiciously overwrite them on the command line.

OK, that's more than enough theory. Let's see this in action. Below is a simple test video, followed by this same test video packaged up with the command: pkg.sh blog.config.

Below is the script that created this video. Hopefully this gives you a fresh appreciation for ffmpeg and an interesting example to play with.

#!/bin/sh

##
## This script is used for packing up a video
##


# Default Variables
base_dir=$(dirname $0)
font_dir=$base_dir/fonts
debug=off

ffmpeg=ffmpeg
ffprobe=ffprobe

logo=logo.png
output=output.mp4

main_video=input.mp4
main_image_x="main_w-overlay_w-20"
main_image_y="(main_h-overlay_h-40)"
main_image_w=500
main_image_h=-1
main_caption_text="Thanks for watching"
main_caption_x=20
main_caption_y="(h-text_h)-40"
main_caption_font_size=48
main_caption_font=Lato-Heavy.ttf
main_caption_color=0x222222
main_caption_start=0
main_caption_end=5

pre_duration=3
pre_image_w=500
pre_image_h=-1
pre_image_x="(main_w-overlay_w)/2"
pre_image_y="(main_h-overlay_h)*.80"
pre_bg_color=0x666666
pre_title_text="Title Text"
pre_title_font='Lato-Heavy.ttf'
pre_title_x="(w-text_w)/2"
pre_title_y="(h-text_h)/2"
pre_title_font_size=48
pre_title_color=white

post_duration=5
post_bg_color=0xE4E4E4
post_title_text="Thanks for watching"
post_title_font='Lato-Thin.ttf'
post_title_font_size=65
post_title_x="(w-text_w)/2"
post_title_y="(h-text_h)/2"
post_title_color=0x222222

config=$HOME/.pkgvid.last
cp /dev/null $config

while [ -n "$1" ] ; do
  arg=$1 ; shift
  if [ -f "$arg" ] ; then
    echo "# $arg" >> $config
    cat $arg >> $config
  fi
  
  name=$(echo $arg | cut -d= -f1)
  if [ "$name" != "$arg" ]; then
     echo $arg >> $config
  fi

  echo >> $config
done

. $config

main_video_width=$($ffprobe -show_streams $main_video 2> /dev/null |grep ^width= | cut -d = -f 2)
main_video_height=$($ffprobe -show_streams $main_video 2> /dev/null |grep ^height= | cut -d = -f 2)

if [ "$debug" = "on" ]  ; then
   ffmpeg="echo $ffmpeg"
fi

$ffmpeg -y  \
 -i $main_video \
  -i $logo  \
  -f lavfi -i color=color=$pre_bg_color:${main_video_width}x${main_video_height}:d=$pre_duration \
  -f lavfi -i color=color=$post_bg_color:${main_video_width}x${main_video_height}:d=$post_duration \
  -filter_complex "\
    [1:v] split [logo_a][logo_b] ; \
    [logo_a]scale=w=$main_image_w:h=$main_image_h[logo_a] ; \
    [logo_b]scale=w=$pre_image_w:h=$pre_image_h[logo_b] ; \
    [0:v][logo_a] overlay=${main_image_x}:${main_image_y} [main] ; \
    \
    [main]drawtext=fontfile=$font_dir/$main_caption_font: \
          text='$main_caption_text': \
          x=$main_caption_x: y=$main_caption_y: \
          fontsize=$main_caption_font_size: \
          fontcolor=$main_caption_color: \
          shadowcolor=black@.8: shadowx=4: shadowy=4: \
          enable='between(t,$main_caption_start,$main_caption_end)'[main]; \
    \
    [2:v]drawtext=fontfile=$font_dir/$pre_title_font: \
         text='$pre_title_text': \
         x='$pre_title_x': y='$pre_title_y': \
         fontsize=$pre_title_font_size: \
         fontcolor=$pre_title_color [pre]; \
    \
    [pre][logo_b] overlay=x='$pre_image_x':y='$pre_image_y' [pre] ; \
    \
    [3:v]drawtext=fontfile=$font_dir/$post_title_font: \
         text='$post_title_text':\
         x='$post_title_x': y='$post_title_y': \
        fontsize=$post_title_font_size: \
        fontcolor=$post_title_color [post]; \
   \
    [pre][main][post]concat=n=3 \
  " \
  -vsync 2 \
  $output

No comments:

Post a Comment