Skipping a show's intro using chapters


Ever had the problem where you are watching a series, and had to skip past the intro of episode way too many times? Usually I would drag the video scrubber forwards, wiggle it a bit until it settles at the perfect spot. Surely there has to be a better way to do this!


(and context)


As most developers would do, I built an automated technical solution that took me about 2 hours to complete, when I could have resolved the issue manually in minutes. This was my strategy:

Insert chapter data into the video


On some DVDs with compatible players, the player may have “next chapter” and “previous chapter” controls. In theory, if I could insert the chapter data to mark out the intro section, I can then easily skip past it by pressing “next chapter” on VLC (Ctrl+N). This would first require me to know exactly when the intro begins and ends.

Identify the timestamp of the intro clip.


The intro clip always uses the same audio track, so its position can be identified by audio fingerprinting. This process transforms sound files into the frequency domain, and compares the spectrogram against a set of known signatures. The same technique is used by companies like Shazam to identify currently playing music. This writeup goes deeper on the nitty-gritty of how it works.

In my case, the intro clip is always 52 seconds long, so I only need to know the timestamp where the intro begins playing, to find out when the intro is done playing.


Create a fingerprint of the intro soundtrack

The audio track has to be extracted from any of the episodes, so that the intro music can be cut out. FFmpeg does a wonderful job here:

ffmpeg -i input.wav -ss 00:00:00 -t 00:15:00.0 -vn -acodec pcm_s16le -ar 44100 -ac 1 output.wav

This creates a .wav file with the first 15 minutes of the episode’s audio. Audacity is then used to locate and cut out about 15 seconds of the intro music that will be used as a fingerprint.

Process the input episodes to find the timestamp of the intro

Using the very nifty, open source SoundFingerprinting library, I could then load the fingerprint wave file as a signature. Using the same aviwav conversion method, I loaded the shows as a bunch of .wav files and ran the fingerprinting process.


The input files produce exactly one match, along with the timestamp (in seconds) of the matched position. It’s amazing how the SoundFingerprinting library accomplished the hard parts of this endeavour without much fuss.

Mark out the intro as a chapter

DVD chapter files are relatively straightforward text files that look like this:


The chapter files could be generated without issues. However, I hit a roadblock when I discovered that .avi files do not seem to have a readily-accessible method to include the chapter files.

Detour: converting the AVI files to MKV

With the video container format issue, I looked into other formats that had more documentation on inserting chapter files. The .mkv format looked very promising, with cross-platform tooling (mkvtoolnix) to achieve that.

To continue, I had to convert the .avi files without losing any fidelity; preferably the video and audio should not be re-encoded as far as possible. FFmpeg manages to do that wonderfully (and quickly) with this command:

ffmpeg fflags +genpts -i input.avi -c:v copy -c:a copy output.mkv

Merging the chapter data into the video file

With a bunch of .mkv files, I could then use mkvtoolnix’s mkvmerge to splice the chapter data into the file.

mkvmerge -chapters chapter.txt -o output.mkv input.mkv

Again, the content is not re-encoded so the process is very quick. After this was completed programmatically, I then had a bunch of .mkv files with chapter data!


Seeking with the newly-added chapter data is so satisfying.


The script is available here. It will likely require a bit of changes before it works for any other use cases, but is mostly complete.