Tuesday, July 05, 2011

Taming 17,297 Photos

If there's one thing I got, it's photos our little one. They are everywhere - from random laptops and cell phones, to stray memory cards. Shira informed it was time to get my act together and organize them so she could actually do the difficult work of sorting through them and picking out the best ones.

Here's the script / strategy I came up with. In the end, it helped me reorganize and rename 17,297 photos. A task that would be dang near impossible to do by hand.

The main tools I used were Cygwin and ImageMagick. I didn't write these scripts with the intention of having others use them - so if they are helpful, great. But don't expect them to run without some coaching on your end.

Step 1: Build up an image library

The trickiest part of the process was to find all the devices and locations photos our little guy could be found. I had a fairly large external drive lying around, so I carved out a space in e:\Incoming to store all the files I found. Within the incoming directory, I had a directory for the device name, and then I copied in the hodgepodge of files.

Step 2: Build up a primary index

The code below was run at the command line. It craws through the incoming folder and attempts to extract the date the photo was shot for every file it finds. Junk in the folders, such as a non-image files, are more or less ignored.

cd e:/incoming
find -type f | 
 while read x 
  do identify.exe -format '%d/%f@@%[exif:DateTimeDigitized]' "$x" 
done | tee /cygdrive/c/Users/foo/Desktop/photos.index

Step 3: Generate a copying script

Next up, I took the index, photos.index, and ran it through an awk program to create a new shell script. The awk script below takes care of parsing the index, choosing a sane and consistent name for each file, and arranging for it to be copied to that file.

BEGIN {                        
  FS = "@@";                   
$2 == "" { next; }             

/.picasa/ { next; }            

  split($2, timestamp, "[ :]");
  dir = timestamp[1] "-" timestamp[2];
  file = timestamp[1] "." timestamp[2] "." timestamp[3] "-" timestamp[4] "." timestamp[5] "."  timestamp[6] ".jpg";
  printf("mkdir -p $OUTGOING_ROOT/%s\n", dir);
  printf("cp \"$INCOMING_ROOT/%s\" $OUTGOING_ROOT/%s/%s\n", $1, dir, file);

The above script was run as:

  awk -f mkscript.awk photos.index > photos.sh

Step 4: Run the copy script

All that was left to do was to run the script generated in step 3. For the script to run, I needed to set various environment variables. The following took care of this:

  $ export OUTGOING_ROOT=e:/outgoing
  $ export INCOMING_ROOT=e:/incoming
  $ sh -x photos.sh

After a couple hours, I had a cleanly organized directory, where each file was named by the date and time it was created. Photos from different cameras at the same event were neatly placed right next to each other.

With that out of the way, Shira just needs to look through the 17,000 photos and decide which ones to print. Too bad there's no command line utility to take care of that task.

No comments:

Post a Comment