Monday, February 05, 2024

Wanted Lofty AI Automation; Got Epic AI Distraction

Lately, I've been experiencing a case of AI-inspired automation-FOMO. I've got a sense that ChatGPT and other AI tools can help me automate a number of tedious tasks, yet I haven't had a chance to experiment with this. To put my mind at ease, I started digging into OpenAI's ChatGPT API docs. My intent was to find a practical scenario that would benefit from the magic of AI. Of course, I quickly chucked this noble goal, opting instead to focus on one of ChatGPT's sexier features: image generation.

Using the ChatGPT API, you can trivially send a description to the /images/generations endpoint, and in a few seconds, be rewarded by a freshly generated picture that matches the description. Here's an example:

$ curl -s -H 'Content-Type: application/json' \
       -H 'Authorization: Bearer <API Key>' \
       -d '{ "model": "dall-e-3", "prompt": "Create an image of a mystical machine that you feed in a description of an image and the machine produces that image. The image should be created in the style of a page out of Leonardo Divinci'\''s famous journals.", "n": 1, "size": "1024x1024" } '
  "created": 1706768428,
  "data": [
      "revised_prompt": "Imagine an enchanting apparatus with a truly mystical function: input a worded depiction of a scene and behold as it crafts the described image. This uncanny but fascinating contraption appears to be constructed with old, worn-out gears, levers, and dials. It is set within the yellowed pages of an ancient journal. The illustrations surrounding the machine reflect detailed observations, studies, and annotations reminiscent of the kind found in Leonardo da Vinci's legendary chronicles. Note his signature use of mirroring handwriting, intricate sketches, and the feel of the Renaissance era.",
      "url": ""

Amazing, right? No, this image isn't going to win any awards. And no, this use of ChatGPT isn't going to streamline my business or coding life. But come on, this is simply too cool not to play with.

I've folded this functionality into the chatgptassist shell script for easy execution. Now, image generation is a Linux one-liner:

$ url=$(chatgptassist -a generate-image -p "Generate an image of a machine, that has a slot to feed in sheets of paper and a small window. If you look in the window, you see a haggard individual hunched over a drafting table, hard at work drawing the images that are being fed in from the slot. Show a conveyor belt where the finished images are ejected from the machine. The outside of the machine should look as a futueristic devic as imagined by an individual in the 1950's.")
$ curl -s "$url" > machine2.png

Again, can we take a moment and appreciate how remarkable it is that I could conjure this image using little more than a vague description? I love how the request included the phrase "haggard individual," and sure enough, the subject in the final image has a deeply furrowed brow. That's some impressive attention to detail.

Somewhat surprisingly, I've found that image generation does have some practical uses. For example, I was debugging a checkout page of an eCommerce platform and needed to create a product with multiple variations. I could have solved this a number of ways, but asking ChatGPT for red and blue versions of a ninja, pirate, and astronaut-themed chess set was both easy to do and made for a delightfully realistic test product.

$ for theme in ninja pirate astronaut ; do \
  for color in red blue; do \
     prompt="Generate an image of a chess set that is suitable for use on an \
             ecommerce product page. The chess pieces should have the theme \
             of $theme. One player's chess pieces \
             should be $color and the others be black." ; \
      echo $theme-$color ; \
      url=$(chatgptassist -a generate-image -p "$prompt") ; \
      curl -s "$url" > $theme-$color.png ; \
  done ; \

While ChatGPT's image generation is amazing, it does appear to have significant limitations. For example, I tried to have it draw a basic LAMP stack network diagram. Compared to a pirate-themed chess set, this task was a piece of cake. I described what I was after:

$ url=$(chatgptassist -a generate-image -p "You are a network engineer with an advanced understanding of Amazon's Web Services. Draw a network diagram of a LAMP stack as implemented by AWS services.")
$ curl -s "$url" > network1.png

The resulting image, while visually interesting, completely missed the mark.

I tried again using a different description:

$ url=$(chatgptassist -a generate-image -p "Draw a network diagram of a typical load balanced web app. Include a load balancer, web servers, a redis caching server and database.")
$ curl -s "$url" > network2.png

Again, Chat GPT proudly presented me with an interesting image—yet it's equally useless.

This suggests to me that Chat GPT's image generation isn't appropriate when details matter. With that said, it's excellent for broad concepts, placeholder images and for sparking ideas.

Now that I've built my first AI based tool you might be wondering how I'm feeling. Here, let me show you:

 chatgptassist -a generate-image -p "A programmer sits at Linux terminal, with a cup of tea nearby. He's got emacs and bash running on screen, as well as an image viewier. Inside the image viewer is an image for rainbows and butterflies coming out of a laptop. The programmer is celebrating his success." | tee ~/dl/url.out | clipit

No comments:

Post a Comment