I believe that Jedediah Hotchkiss' Civil War sketchbook would make for interesting reading. While this work is publish on the Library of Congress's (LoC) website, at 224 pages I wanted a more convenient way of reading the document than looking through an online image gallery.
Here's how I arrived at a single PDF file that contained all 225 pages of Jed's personal sketchbook.
Step 1: I viewed the source of the LoC page and noted a rel="alterantive" link tag.
Step 2: curling this URL returned back a wealth of interesting information:
$ curl -s 'https://www.loc.gov/item/2005625258/?fo=json' | jq .
{
"articles_and_essays": [
{
"site": [
"lcweb"
],
"contributor": [
"potter, abbey"
],
"original-format": [
"web page"
],
"partof": [
...
Step 3: rather than reading the details of this JSON format, I poked around until I found this critical block:
"resources": [
{
"files": 117,
"captions": "http://cdn.loc.gov/service/gmd/gmd388m/g3880m/g3880m/gcwh0001/captions.txt",
"image": "http://cdn.loc.gov/service/gmd/gmd388m/g3880m/g3880m/gcwh0001/ca000001.gif",
"url": "http://www.loc.gov/resource/g3880m.gcwh0001/"
}
],
Step 4: between curl and my browser, I was able to write the following code which pulls down all 117 images associated with this LoC entry:
#!/bin/bash
##
## Grab content from the library of congress
##
## For example:
## locget 'https://www.loc.gov/resource/g3880m.gcwh0001/?c=200&fo=json&st=slideshow'
##
usage() {
echo "`basename $0` {gallery-url}"
exit 1
}
if [ -z "$1" ]; then
usage
fi
resource_url="$1"
captions_url=$(curl -s $resource_url | jq -r '.resources[0].captions')
image_url=$(curl -s $resource_url | jq -r '.resources[0].image')
path=$(dirname $image_url | sed -e 's|http://cdn.loc.gov/||' \
-e 's|/|:|g')
curl -s $captions_url | while read row ; do
file=`echo $row | cut -f 3 -d ' '`
if [ -n "$file" ] ; then
curl -s "http://tile.loc.gov/image-services/iiif/$path:$file/full/pct:100/0/default.jpg" > $file.jpg
fi
done
Note the call to tile.loc.gov to pick up the image files. By setting pct:100, I'm able to request full size images. It's also possible provide a value like pct:50 to pick up images that are half size.
Step 5: with step 4 complete, I had a full set of images locally. However, each image contains both a left and right hand page. To split the pages into separate files, I used my good friend ImageMagick:
$ mkdir pages $ cd pages $ for f in ../*.jpg ; \ do echo $f ; convert -crop 50%x100% +repage $f `basename $f` ; \ done
Step 6: Finally, I created a single (massive) PDF file by running the command:
$ convert *.jpg master.pdf
You can download the generated PDF here.
And here's a few screenshots of me scrolling throw Jed's sketchbooks on my Galaxy S9+:




The formatting isn't perfect, and the PDF file is massive. But still, I'm able to scroll through the pages with ease, and I can view detail by simply zooming in.
If I had a horse, I could peruse the content from the same perspective Jedediah created it. Though, even I admit that's probably excessive.
No comments:
Post a Comment