Monday, August 13, 2007

Rethinking Return Values

The Problem

The other day I wanted to write a function in Scheme that did the following:

Function url-info takes in a URL and does the following. First it follows redirects till it either encounters a real page (response code is 200) or until it receives a server side error, such as a 404 or 500. If a real page is found, the value of the title tag is extracted. Both of the final URL and the value of the title are returned by this function.

For example, suppose I called url-info on http://tinyurl.com/39n5jb the result would be two values: http://benjisimon.blogspot.com and Ben Simon's Journal | ....

Writing this function in SISC is fairly trivial because you can leverage existing libraries like HTTPClient and HtmlParser that will do the heavy lifting. The question I was left with was simple - what should the interface to this function be?

The Java Solution

If I were coding this problem in Java, the answer to the interface question would be probably be:

  public UrlInfo urlInfo(URL url) throws UrlInfoException {
      // magic goes here
  }

The UrlInfo object would be aPOJO with get methods for accessing the resolved URL and the title of the page. UrlInfoException would be a typical exception with methods to get the failed HTTP error code and a reason why it failed.

It's possible to write the Scheme function using the exact same pattern (return an info object, throw an exception). Before I did so, however, I wanted to resolve two nagging issues.

  • Most of the time, when I get back the info object, the first thing I'm going to do is take it apart.
  • The function is biased towards dealing with the successful case. You can definitely try/catch the exception and handle invalid URLs, but the code to do so is clunkier than the code that deals with a successful return value.

The Scheme Solution

After a bit of puzzling over this, I decided I would try a different approach in Scheme:

 (url-info url success-handler failure-handler)

That is, you invoke url-info function not only with a URL to process, but two different functions. One function, the success-handler is invoked when the URL is processed correctly. The other handler, failure-handler is invoked when then URL can't be resolved.

What does url-info return? It returns whatever the handlers return. Here are some example uses of this function (quick Scheme lesson: (lambda (...) ...) is just like function(...) {...} is in JavaScript) :

;; Success: Just return the title of the page, upper cased, 
;;          throwing away resolved-url.
;; Failure: Throw an exception saying the call failed
(url-info url (lambda (resolved-url title)
                 (string-upcase title))
              (error/f "Unable to resolve URL, reason: ~a (~a)"))


;; Success: make an info object for use later
;; Failure: make an error object for use later
(url-info url make-url-info make-url-error)

;; Success: raise an error
;; Failure: store the requested URL in a database
(url-info url (lambda (resolved-url title)
               (error (format "Sorry, ~a already exists. Choose another." 
                               resolved-url)))
              (lambda (code reason)
               (store-new-url! url)))

Here's what I like about this approach:

  • The programmer is explicitly required to handle both the success and failure case. It's all too easy in Java to swallow, print out, or mis-handle an error case.
  • Destructuring can happens automagically for you -- when the handlers are called, your arguments are taken apart for you.
  • The url-info code doesn't need to worry about how it's going to be used. It's just as easy to use the failure case to get work done, as it is to use the success case.
  • It is possible to create a library of handlers that are defined independently of the url-info package. One use for this might be to standardize error handling on a on a per-project basis.
  • Converting this generic handler approach to the Java info object/exception approach is fairly painless. Yet other interesting patterns are available.

What's the down side? The major down side is that the nesting of different calls can get somewhat confusing, and this all breaks aways from the standard C like control flow so many developers are used to. However, given the fact that the above approach can be so powerful, I think the initial readability issue is a reasonable trade off.

I ended up implementing the handler style approach. So far, I'm liking how it works. However, we'll see in a few months if I still love it.

If you were writing this same function, what would you make the API?

No comments:

Post a Comment