Two years ago I raced the gruelling 25 Hours of Thunderhill organized by NASA. It’s the longest road race in North America, if not the world, and pits sixty-something teams and hundreds of crew members against each other for gargantuan trophies in one of five classes.
I drove for Team Bimmerworld in an early 1990s BMW M3 and was in the car at night when the race was temporarily stopped due to thick valley fog. The changing visibility from lap to lap was challenging but fun and all of the drama was captured on a ChaseCam. Unfortunately the memory card disappeared with BMW E30 rally-star Bill Caswell and we never got more than halfway in exchanging the data. The main problem? Bill is a non-stop motorsport animal always on the go and the file was 13GB.
At one point Bill managed to get the file uploaded to getdropbox.com but even installing their desktop client, I was unable to make it more than about 500MB at 5k/sec before timeouts or other network shenanigans would restart the process. Frustrating!
That was twelve months ago. This week I was cleaning out some old emails and came across the link to the file and, for fun, tried clicking it. Hey, it’s still there! And still 13GB.
On a whim I logged in to my web server to see what I could do with the venerable command-line utility wget. Turns out wget has a few tricks up its sleeves and some 60 hours later I’ve managed to download 5GB of the video. Here’s the magic in the command line:
wget --continue --progress=dot:mega --tries=0 <URL>
I was unaware of the continue option before which tells wget to try and restart any downloads where they left off. The progress option indicates 3MB per line of dots rather than 384k; appropriate for a file of this size. And finally, tries=0 means keep trying forever regardless of how many times the connection fails. Here’s what a failure looks like:
2010-11-26 10:25:48 (38.4 KB/s) - Connection closed at byte 5314183343. Retrying.
--2010-11-26 10:25:58-- (try:20) http://dl.dropbox.com/u/PRIVATE-URL
Connecting to dl.dropbox.com|75.101.129.115|:80... connected.
HTTP request sent, awaiting response... 206 PARTIAL CONTENT
Length: 14425877949 (13G), 9111694606 (8.5G) remaining [video/quicktime]
Saving to: `2009 25 hours of thunderhill reel 10.mov'
[ skipping 5188608K ]
5188608K ,,,,,,,, ,,,,,,,, ........ ........ ........ ........ 36% 61.7K 2d0h
5191680K ........ ........ ........ ........ ........ ........ 36% 38.1K 2d2h
Success! This is the 20th restart for 5GB averaging 500GB between connection failures. Thanks to these few little tweaks it should keep trying until it finishes and then I’ll finally have my footage!
Brian said:
on November 30, 2010 at 9:24 am
It took some 6 days but I got all 13gb! I need to run by the colo so I can copy it to a USB drive rather than try and download it but the file looks complete.
Brian said:
on December 6, 2010 at 3:16 pm
Here’s one more big of (unrelated) wget kung-fu; this one spiders a website and turns it into a directory full of html, graphics and css:
wget --random-wait -p -r -e robots=off -U mozilla http://foo.com/