Monday, November 12, 2018

Jedediah Hotchkiss' Sketchbook | From Library of Congress image gallery to mobile friendly PDF

I believe that Jedediah Hotchkiss' Civil War sketchbook would make for interesting reading. While this work is publish on the Library of Congress's (LoC) website, at 224 pages I wanted a more convenient way of reading the document than looking through an online image gallery.

Here's how I arrived at a single PDF file that contained all 225 pages of Jed's personal sketchbook.

Step 1: I viewed the source of the LoC page and noted a rel="alterantive" link tag.

Step 2: curling this URL returned back a wealth of interesting information:

$ curl -s  '' | jq .
  "articles_and_essays": [
      "site": [
      "contributor": [
        "potter, abbey"
      "original-format": [
        "web page"
      "partof": [

Step 3: rather than reading the details of this JSON format, I poked around until I found this critical block:

    "resources": [
        "files": 117,
        "captions": "",
        "image": "",
        "url": ""

Step 4: between curl and my browser, I was able to write the following code which pulls down all 117 images associated with this LoC entry:


## Grab content from the library of congress
## For example:
##  locget '' 

usage() {
  echo "`basename $0` {gallery-url}"
  exit 1

if [ -z "$1" ]; then

captions_url=$(curl -s $resource_url | jq -r '.resources[0].captions')
image_url=$(curl -s $resource_url | jq -r '.resources[0].image')

path=$(dirname $image_url | sed -e 's|||' \
                              -e 's|/|:|g')

curl -s $captions_url | while read row ; do
  file=`echo $row | cut -f 3 -d ' '`
  if [ -n "$file" ] ; then
    curl -s "$path:$file/full/pct:100/0/default.jpg" > $file.jpg

Note the call to to pick up the image files. By setting pct:100, I'm able to request full size images. It's also possible provide a value like pct:50 to pick up images that are half size.

Step 5: with step 4 complete, I had a full set of images locally. However, each image contains both a left and right hand page. To split the pages into separate files, I used my good friend ImageMagick:

$ mkdir pages
$ cd pages
$ for f in ../*.jpg ; \
   do echo $f ; convert -crop 50%x100% +repage $f `basename $f` ; \

Step 6: Finally, I created a single (massive) PDF file by running the command:

$ convert *.jpg master.pdf

You can download the generated PDF here.

And here's a few screenshots of me scrolling throw Jed's sketchbooks on my Galaxy S9+:

The formatting isn't perfect, and the PDF file is massive. But still, I'm able to scroll through the pages with ease, and I can view detail by simply zooming in.

If I had a horse, I could peruse the content from the same perspective Jedediah created it. Though, even I admit that's probably excessive.

