Hacking O’Reilly’s “Fluent Conference 2013 ” video download page

The Fluent Conference 2013 videos were out and O’Reilly was offering a nice 50% discount to early purchasers. I went to their website and made the purchase. When I looked at the download page I realized that no videos were linked to my Dropbox account (like O’Reilly does when you purchase a book from them). I wanted to have a local copy of all videos to be able to watch them offline but O’Reilly only provides single download link per-video. There isn’t a “Download All” link and there isn’t any download manager available. My first reaction was to say -screw this- I’ll download them one by one. As I downloaded the first one I realized that not only the download speed was sub-par (1.5 megabytes per second) but I was being disconnected very often and the download had to start all over again (it wasn’t resuming). Not good.

Disclaimer: This post doesn’t describe any illegal activities but rather uses the term hacker in the way described in this Wikipedia article.

Plan B. Let the hacking begin.

I viewed the page source and realized that they were using jQuery. I also saw that the download URLs for all the videos were right there; there was no redirections or obfuscation of any kind. So after a couple tries using Firebug’s console, I came up with this jQuery:

$(".format").find("a:nth-child(2)").each(function( index ) {
    console.log(this.href); 
});

This basically says “find all nodes with the format class and drill down to the second link. Give me the href for all these and print them out to the console”.

Great, now I had a list of the URLs for all the videos. I saved them to a plain text file. I now needed a method to download them all without me having to click, click, click…

wget

wget is very flexible when it comes to downloading stuff (which is a good thing since it was written to do that!). So this is what I tried:

wget -i list.txt

The -i makes wget download files from list.txt. This worked great except that sometimes the download would stall exactly like it did when I was using the browser. Now, wget was designed to work well with crappy connections so it would retry a few times and start downloading again. But… sometimes it would just give up and would leave a half downloaded file. Restarting it again would cause the file to be overwritten so I added the -c (continue) param to prevent this and wget would pickup were it left off.

So far it was a great improvement: now the downloads would retry if failed and resume from where they left off. I only needed to solve the problem where wget would just give up sometimes. This happened very rarely so it wasn’t a big deal but still…

Bash

So I wrote this simple bash script that would just restart wget again if it stopped.

#!/bin/bash
for i in {1..50}
do
    wget -c -i list.txt
done

So even after the downloads had finished if the wget cycle was restarted, no significant bandwidth would wasted as wget was smart enough to not download stuff that it had already downloaded.

After the downloads finished I saw that the file names were horrible: ea4b8c913c44e_5813709.mp4?title=42_secrets-of-awesome-javascript-api-design-brandon-satrom.mp4. Yikes.

Enter PHP

I wrote this little script that does some quick and dirty string manipulation and renames all the videos it finds in a directory.

$handle = opendir('.');

if (!$handle) {
    echo "Can't open dir\n";
    die();
}

while (false !== ($entry = readdir($handle))) {

    $pathinfo = pathinfo($entry);
    if (!isset($pathinfo['extension']) || $pathinfo['extension'] != 'mp4') {
        continue;
    }

    $tokens = explode('=', $pathinfo['filename']);
    $filename = $tokens[1];
    $filename = str_replace('-', ' ', $filename);
    $filename = str_replace('_', ' - ', $filename);
    $filename = ucwords($filename);
    echo $filename . "\n";

    rename($pathinfo['basename'], $filename . '.' . $pathinfo['extension']);
}

closedir($handle);

The script turned this:

ea4b8c913c44e_5813709.mp4?title=42_secrets-of-awesome-javascript-api-design-brandon-satrom.mp4

… into this:

42 – Secrets Of Awesome Javascript Api Design Brandon Satrom.mp4

Much better!