Thursday, January 26, 2017

Packt Publishing's Free Book Of Day, Command Line Edition

My friend Nick recommended I check out Packt Publishing's free eBook of the day. To my surprise, the books offered there are the real deal. Today's book, for example, is Learn Penetration Testing with Python, a book I find interesting, but couldn't justify forking over actual cash for.

Now I suppose most people would simply bookmark the page and move on with their life. But for some reason, perhaps the technical nature of the site, I figured I could use this as a hacking opportunity (a hackertunity, if you will). Specifically, I wanted a command line tool that would echo to me the current free offer.

I give you: packtfree:

#!/bin/bash

##
## command line utility to retrieve the current free
## book from packt publishing
##

url=https://www.packtpub.com/packt/offers/free-learning

clip_url () {
  echo -n $url | clip
}

while getopts ":uc" opt ; do
  case $opt in
    u) echo $url ; exit ;;
    c) clip_url ;  exit ;;
    \?) echo "Usage: `basename $0` [-uc]" ; exit ;;
  esac
done

clip_url
curl -s $url | pup  '.dotd-main-book-summary h2, .dotd-main-book-summary div:nth-child(4)'

And running it produces:

$ packtfree
<h2>
 Learning Penetration Testing with Python
</h2>
<div>
 Utilize Python scripting to execute effective and efficient penetration tests
</div>
$ packtfree -u
https://www.packtpub.com/packt/offers/free-learning

One non-obvious feature is that running packtfree places the URL to the free book on the system clipboard. This makes it easy to switch to a browser of my choice and paste it if I want to redeem said book.

While this is a pretty obvious example of over-engineering a problem, it does address a challenge that I really haven't had a good answer to. What's the best way to work with HTML data on the command line? For text, you've got sed, awk and countless other old school Unix utilities. For json, the clear winner is jq. But what about HTML / XML?

In the past, I've used tools like w3m to convert HTML to text and hack away from there. But that's always a compromise.

Doing a fresh search for this challenge turned up pup, which is a fantastic command line tool for extracting and manipulating HTML. It's exactly what I was searching for, and unlike some XML tools, had no problem working with the real-life content over at packtpub.com. Nearly the entire script above is setup, with the main functionality being this sweet one-liner:

curl -s $url | pup  '.dotd-main-book-summary h2, .dotd-main-book-summary div:nth-child(4)'

pup is definitely going in my command line toolbox.

Look at that, Packt books are already teaching me something, and I didn't even have to read one.

No comments:

Post a Comment