Thursday, June 18, 2009

Scripting Excel With PLT-Scheme - Goodbye Visual Basic, Hello Scheme

For some time, I've known about MysterX - a Scheme facility that allows you to talk to Windows COM objects. Not being a Windows programmer, I've never really had a strong appreciation for what this would buy me - but from my latest playing, I think it could be huge.

Excel allows you to script nearly every bit of functionality that it offers (examples includes: creating workbooks, adding worksheets, updating cells, changing formatting, etc.). Typically, you do this via Visual Basic (or VBA - Visual Basic for Applications). However, it turns out, VBA is just using a COM interface - which MysterX gives you access to. In other words, by using COM, you should be able to get all the power of Excel (and other Windows apps) with the elegance of Scheme. Like I said, I think it could be huge.

Note - this functionality isn't unique to Scheme - perl, ruby and most other languages offer this too. So even if you're not a Scheme person, you might want to pay attention here and apply what I discuss below to your language of choice.

The Gotchas

MysterX offers a simple API for dealing with COM objects. Effectively you create objects and then get/set properties and call methods on them. From looking at the manual, you can see there's not much to it.

By using various Visual Basic and perl examples I was able to figure out the sequence of calls I wanted to make. Creating an Excel instance was easy enough:

(require mysterx)
(define excel
 (com-get-active-object-from-coclass "Microsoft Office Excel Application"))

So was creating a Workbook:

;; In English: access the Workbooks property on excel
;; and then invoke the Add method
(define book (com-invoke (com-get-property excel "Workbooks") "Add"))

And then I got stuck. I needed to run code such as the following:

  Workbook.Worksheets("Sheet1")

What the heck is Worksheets above? Is it a property or a method? It's described as a property in the docs, but simply accessing:

 (get-com-property workbook "Worksheets")

Didn't work. I knew I wanted to index into this property, but how?

Attempting to treat Worksheets as a method that took in the index didn't work either.

This stumped me for quite some time.

The other big gotcha was understanding the Excel object model. For the life of me, I couldn't find the docs that described all the details. This left me groping around for methods and properties, not quite sure what I could depend on.

The Solutions

The docs, it turned out, were easy to find once I knew where to look. By going into Excel, and then selecting the Developer Tab (which you must enable first), and then clicking on the Visual Basic icon, you'll be brought into the VB editor. You can then click help here, and you'll see the following options:

Click on the Excel Object Model Reference topic and you'll be able to get all the nitty gritty details about the properties and methods for each object.

As for the other issue - that took experimentation and guess work. Turns out, you can do:

;; (1) The docs say there's a property named Sheets on the workbook
(define sheets (com-get-property workbook "Sheets"))
;; (2) The Sheets object has a property named Item that takes in 1 argument
(define sheet (com-get-properties sheets '("Item" "MySheet")))

Apparently, the notion that a property is indexed is implemented in com-get-property. You access it by handing in a list representing the property and arguments you want to index by.

Addtionally, com-get-property allows you to chain calls together so that the following is possible:

VB: 
  Worksheet.Sheets("My Sheet").Range("A3:D19").Value

Scheme:
(com-get-property worksheet "Sheets" 
    '("Item" "My Sheet") '("Range" "A3:D19") "Value")

While not as compact as the VB code, it's reasonably close.

Pleasantly, the return types from the VB code are quite sane. Consider this spreadsheet:

Here are the Scheme values when extracting ranges:

;; Access B3:B9
#(#("foo" "bar" "baz" "zap" "zing" "zoop" "zlog") 
  #(102.0 83.0 76.0 54.0 34.0 10.0 2.0))

;; Access B10 (empty)
#<void>

;; Access C11
361.0

As you can see, not only are string and numeric cells returned as expected. But ranges are converted to vectors of vectors, and empty cells are returned as void?.

A Quick and Dirty API

Using my new found knowledge, I was able to cobble together the following API. It's woefully incomplete, but at least you can get an idea of how you can package up these Excel calls:

#lang scheme
;;
;; Use COM & mysterx to integrate with excel
;;
(require mysterx)

;; Some types - for all I know, a cell can hold more than just a 
;; string, number of empty. But this is what I've seen so far.
(define cell/c (or/c string? number? void?))
(define cells/c (vectorof (vectorof cell/c)))

(provide/contract
 [open (-> path-string? com-object?)]
 [close (-> com-object? boolean? any)]
 [new (-> com-object?)]
 [get-value (-> com-object? string? string? cell/c)]
 [set-value! (-> com-object? string? string? cell/c any)]
 [get-values (-> com-object? string? string? cells/c)]
 [set-values! (-> com-object? string? string? cells/c any)])

;; Try to be smart about getting ahold of excel. If it's running, use
;; that instance. If not, create a new instance.
(define (excel-instance)
  (define class-name "Microsoft Office Excel Application")
  (with-handlers ([exn? (lambda (ex)
                          (cocreate-instance-from-coclass class-name))])
    (com-get-active-object-from-coclass class-name)))

;; call the open method to access an existing workbook.
(define (open path)
  (let* ([excel (excel-instance)]
         [workbooks (com-get-property excel "WorkBooks")])
    (com-invoke workbooks "Open" (if (path? path)
                                     (path->string path) path))))

;; Create a fresh workbook
(define (new)
  (let* ([excel (excel-instance)]
         [workbooks (com-get-property excel "WorkBooks")])
    (com-invoke workbooks "Add")))

;; Close a workbook
(define (close doc save?)
  (com-invoke doc "Save" save?))

;; A private function to access a range. You provide
;; the document, the sheet of interest and the range.
;; Note the range can be any of the following:
;;  A3:B6 - a group of cells
;;  C17   - a single cell
;;  Foo   - a named range that's defined in the document already
;; There may be more options too - the range is just passed to excel to interpet.
(define (get-range doc sheet-name range)
  (com-get-property doc "Sheets" `("Item" ,sheet-name) `("Range" ,range)))

;; There's really no difference between getting and setting
;; one value over multiple values. We have a contract on these to 
;; provide some type checking and structure - but really, we leave Excel
;; in charge of figuring out when we mean 1 cell versus a block of cells.
(define (get-value doc sheet-name range)
  (com-get-property (get-range doc sheet-name range) "Value"))
(define (get-values doc sheet-name range)
  (com-get-property (get-range doc sheet-name range) "Value"))

;; Why Value2 below instead of Value? I have no idea. It works.
(define (set-values! doc sheet-name range value)
  (com-set-property! (get-range doc sheet-name range) "Value2" value))
(define (set-value! doc sheet-name range value)
  (com-set-property! (get-range doc sheet-name range) "Value2" value))

What Next?

I think this approach opens up all sorts of interesting avenues. One area that I'm considering using this approach on is project that would treat Excel as a user interface and would use Scheme to do the computational processing. This gives the user a well known UI to work with, yet allows me to avoid having to implement clunky code in VB.

I can also see some interesting opportunities for automation. I'll have to see if I can dig up the COM object models for other apps - like Word, PowerPoint and Internet Explorer.

I always love it when I get happy surprises from Windows, and I think I'll mark this COM stuff down as one of those.

8 comments:

  1. Isn't COM on its way out, being replaced by .Net? Of course, as long as it works, things like these are very nice.

    ReplyDelete
  2. Mathias -

    With my luck, the answer is yes. Yeah, this post is about 5 years too late.

    Though, Microsoft certainly has a reputation for keeping products alive for way longer than they should - so perhaps this will work for some time.

    -Ben

    ReplyDelete
  3. Ah, no. COM is still alive and kicking. Your COM code will continue working until Microsoft stops shipping Office.

    ReplyDelete
  4. Ben: Great post

    Mathias: From what I have read; COM will never go away as the heart of Windows and .NET is apparently Win32, and COM is the only way for .NET to access it.

    ReplyDelete
  5. Ben -- that's great, thanks for laying it out. I've used MysterX and COM intermittently for years; I even created my job talk with it! For another example (MysterX talking to iTunes), see my post on the plt-scheme list.

    ReplyDelete
  6. Thanks for the post Ben, it's a great introduction. You may also be interested in my win-control library for Gambit that has a very basic COM interface but also interacts with Windows apps by sending GUI messages so you can also automate applications that don't have an embedded COM server

    http://code.google.com/p/win-control

    It would probably port quite easily to PLT.

    Andrew

    ReplyDelete
  7. Thanks all for the comments and the links.

    I knew Microsoft wouldn't let me down by actually phasing out products ;-).

    ReplyDelete
  8. Anonymous6:24 AM

    Hi Ben, I found your article really interesting. Had a question to ask you. Is it possible to modify array formulas using MysterX without having to press the CTRL-SHFT-ENTER keys .
    Would appreciate any advice !

    ReplyDelete