Thursday, February 02, 2023

OCGs meet CLI | GeoPDF Layer Management from the Command Line

As soon as I learned that USGS GeoPDF maps contained multiple layers, I wanted to find a way to toggle these layers from the command line. Combined with my USGS download script, I figured I could retrieve and prepare maps in bulk. While I found many pdf command line tools, surprisingly, I couldn't find any that worked with layers.

My luck finally turned when I learned that PDF Layers are also known as 'Optional Content Groups', or OCGs. Using this term, I found a number of promising libraries and tools.

The most obvious one was the python based pdflayers script. But alas, it failed to identify any layers when I pointed it to a topo PDF.

Next up, I investigated the impressive PyMuPDF library. It offered functions like get_layers() and get_ocgs() which made it quite promising. I installed the library with the command:

$ python3 -m pip install --upgrade pymupdf

and kicked off a REPL to experiment with these functions. My Python skills are weak, but I did confirm that PyMuPDF identifies layers within a USGS GeoPDF.

I hacked together my own version of pdflayers* and before I knew it, I had just the tool I was after. Let's see it in action:

# Look at the existing layers of a freshly downloaded USGS topo map
$ pdflayers -l VA_Alexandria_20220927_TM_geo.pdf
off:236:Labels
on:237:Map Collar
on:238:Map Elements
on:239:Map Frame
on:240:Boundaries
on:241:Federal Administrated Lands
on:242:Department of Defense
on:243:National Park Service
on:244:National Cemetery
on:245:Jurisdictional Boundaries
on:246:County or Equivalent
on:247:State or Territory
on:248:Woodland
on:249:Terrain
off:250:Shaded Relief
on:251:Contours
on:252:Hydrography
on:253:Wetlands
on:254:Transportation
on:255:Airports
on:256:Railroads
on:257:Trails
on:258:Road Features
on:259:Road Names and Shields
on:260:Structures
on:261:Geographic Names
on:262:Projection and Grids
off:263:Images
on:264:Orthoimage
on:265:Barcode

# 'Enable' specific layers. When enabling, all other layers are
# implicitly disabled
$ pdflayers -e 237,238,239,249,251,252,253,262 \
   -i VA_Alexandria_20220927_TM_geo.pdf \
   -o t.VA_Alexandria_20220927_TM_geo.pdf

# Check my work: success!
$ pdflayers -l t.VA_Alexandria_20220927_TM_geo.pdf
off:236:Labels
on:237:Map Collar
on:238:Map Elements
on:239:Map Frame
off:240:Boundaries
off:241:Federal Administrated Lands
off:242:Department of Defense
off:243:National Park Service
off:244:National Cemetery
off:245:Jurisdictional Boundaries
off:246:County or Equivalent
off:247:State or Territory
off:248:Woodland
on:249:Terrain
off:250:Shaded Relief
on:251:Contours
on:252:Hydrography
on:253:Wetlands
off:254:Transportation
off:255:Airports
off:256:Railroads
off:257:Trails
off:258:Road Features
off:259:Road Names and Shields
off:260:Structures
off:261:Geographic Names
on:262:Projection and Grids
off:263:Images
off:264:Orthoimage
off:265:Barcode

While these results were encouraging, the question remained: would the updated files continue to behave like proper GeoPDFs? That is, would messing with the layers remove the geographic metadata that lets these PDF files work as interactive maps. I turned to Avenza Maps, an Android App, to answer this question.

Avenza is a map viewer that is powered by PDF files. When loaded with a GeoPDF, Avenza becomes location aware and can display a blue dot on the document at your current position. It can also report the precise coordinates of any spot on the map.

The moment of truth came when I loaded my modified PDF file. Would the blue dot show up? Would it remain geo aware?

It does! The two screenshots above are of the same map, the only difference being the visible layers. The second map has many layers turned off, optimizing it for viewing terrain. You can see contour lines in the first map, but the labels and other features make this harder to do.

One limitation of Avenza maps is that it has no ability to toggle PDF layers. Using my command line tool, I can now sidestep this issue. I can prepare maps using pdflayers and import them into Avenza for optimized use.


*My pdflayers command should really be packaged as a pip module. If this would be helpful, let me know in the comments.

No comments:

Post a Comment