Wednesday, August 22, 2007

Adding Generators - A Language Comparison

Generators - Kinda like Iterators, but Not

Recently, the topic of adding Python style generators to both Java and Scheme has come up. It's interesting to compare the two approaches taken.

Before we jump into the details - what the heck is a generator anyway?

Generators are an approach to encapsulated iteration, much like a java.util.Iterator object is. Here's a trivial Python example:

from __future__ import generators

def foo():
  print 'hello'
  yield 1
  print 'world'
  yield 2

Calling foo above will return a generator back. You can then call next() on that generator (much like an Iterator in Java) to return the values that are associated with the yield statement. Think of yield as being the same as return, with the benefit that calling next() will resume there again.

Why add generators to your language, especially if it already has iterators? Generators simplify your life: rather than manually tracking which value to return next, you have the interpreter handle all the details. This is much like garbage collection - you could manage memory manually, but isn't it nice to have the interpreter do all the book keeping for you. It is.

The Approaches

Here's a description of technique to add generators to Java used by infomancers:

It works by using Java 1.5's facility for hooking user-defined class inspectors/transformers into the classloader, and rewriting the bytecode of a yieldNextCore() method containing calls to yieldReturn(foo) so that its local variables are promoted to members of an anonymous inner class, which also maintains the state of the iterator/generator.

Wow that's slick. Who would have thought to use user-defined class inspectors/transformers to re-write bytecode on the fly? here's a sample of how to use it:

public class PredicateIterable
    implements Iterable {
  private final Collection coll;
  private final Predicate pred;

  public PredicateIterable(Collection coll,
      Predicate pred) {
    this.coll = coll;
    this.pred = pred;
  }

  public Iterator iterator() {
    return new Yielder() {
      public void yieldNextCore() {
        for (Object nextItem : coll) {
          if (pred.evaluate(nextItem)) {
            yieldReturn(nextItem);
          }}}}}}

The Scheme approach was written iteratively, starting off by using the primitive call/cc operator and finishing up by using control.ssand a syntax tweaking macro. Here's the final implementation:

(require (lib "control.ss"))

(define-syntax (define/y stx)
  (syntax-case stx ()
    [(_ (name arg ...) body ...)
     (with-syntax 
         ((yield-name (datum->syntax-object stx 'yield)))
       (syntax
        (define (name arg ...)
          (define (yield-name x)
            (control resume-here
             (set! name 
                   (lambda ()
                     (prompt (resume-here 'dummy))))
             x))
          (prompt body ...))))]))

Here's a sample Scheme usage

(define/y (step) 
  (display "hello\n")
  (yield 1)
  (display "world\n")
  (yield 2)
  'finished)

Lessons Learned

Here's the lessons I took away from this...

In Java, where there's a will, there's a way. It's pretty dang impressive that Java allows you to re-write bytecode on the fly. It may be a relatively crude way to extend the language, but it works.

In Scheme, mere mortals can add experimental concepts to the language. In about the same space it takes to describe the Java solution in English, you can actually write it in Scheme. The implementation doesn't require any low level hacking - just understanding a particular library of control operators, or, if you prefer the basics of how call/cc works. This lowers the bar for adding new language constructs and means that you can bend the language to your project's needs, instead of the other way around.

Scheme's macros allow you to make the new concept blend seamlessly with the language. Comparing the Python and Scheme samples, there isn't a whole lot of difference. Not only can you add new concepts to your language, but you can make them convenient for the programmers who will be using them. Again, the language bends to project, not vice versa.

What concept do you wish your programming language had? Have you tried adding it in 20 lines of Scheme code?

No comments:

Post a Comment