Tuesday, September 06, 2011

Some Ramblings on Racket Places

I finally got around to reading up on Racket Places, one of the mechanisms the Racket team is now offering to support parallelism. It seems they followed the Erlang share-nothing, communicate by messaging passing, model. This is a good thing, because as Erlang has demonstrated, it provides a sane way of programming massively parallel systems.

After reading the Places paper though, I was initially a bit disappointed. I would have a preferred a Scheme'ier solution, one that was based on thunks and not modules; one that allowed for lexical scope and the arbitrary passing of types, not just immutable ones. On the other hand, though, these constraints do serve to simplify matters dramatically for the programmer. One essentially gets to think of a Place as a clean virtual machine running a specific module, which make for easier reasoning.

It also seems to me that the limitations setup by Places means that implementing a Remote-Places facility would be relatively easy. That is, instead of just providing a module and starting function, one could provide a hostname as well. This would setup the Place on the remote machine, and create communication channel between the hosts. In an Amazon EC2 context, where bringing up new servers is trivial, this sort of functionality could be absolutely key. I could even imagine a facility where Places are automatically distributed among hosts, depending on the load of the machine.

I suppose the biggest gotcha is that code to be run on the machine would need to be deployed ahead of time. But, theoretically, this could be worked around as well, having the system copy itself.

If I get some free time, I may have to give this concept a try, as it seems like the Racket team has done all the heavy lifting.

Finally, I'm not quite sure what to make of this statement which comes in the comparison section of the paper:

Erlang’s typical programming model has many more processes than CPU cores and extensive message exchange, while places are designed to be used one place per CPU core and with less message-passing traffic.

In a relatively small, single use program (say, running a benchmark), I can see how you would meet this recommendation. But when the program gets complex, how would one do this? Suppose I'm writing a module to query an Amazon Simple Database. Rather than say:

  (for/list ([q queries])
    (sdb-query q))

I'd like to say something like:

 (for/list/parallel ([q queries])
   (sdb-query q))

In my imaginary scenario, sdb-query gets run in its own Place. Depending on how many queries I'm processing and how often I'm processing them, I would imagine I'd definitely violate the 1-place-per-core recommendation. I suppose I'll to have to wait and see on this a bit, perhaps some real world examples of Places will be available to play with.

If there's one thing I took away from the Places effort it's that adding parallelism is tricky business. I suspect this is an evolving topic, and I'm excited to see where they take it. This is certainly a step in the right direction, and as usual, the Racket team really delivers.

2 comments:

  1. Look forward to your results.

    ReplyDelete
  2. (a) I think that there were some recent additions to make it easier to start a place; (b) the reason that it's a little difficult is that a new place is close to starting a new process; (c) did you look at futures? -- They're much more lightweight.

    ReplyDelete