Tuesday, June 19, 2007

Dealing with new docx format - converting it to text

If memory serves - a few years back, you couldn't send a person an MS Word attachment without getting a message back saying they couldn't open it. There were simply too many version of Word, and the formats were just slightly incompatible. In a lot of respects, this allowed for the PDF format to really take off - it was one of the few reliable ways of sending a rich document to someone and knowing they could open it.

Then, over time, things settled down. In recent memory, I can't think of a time when I had to wrestle with Word's format. Heck, even Google documents allows me to reliably save / open Word documents.

But, I think those days of easy interchange are numbered (though, they'll come back again when things settle down). I wrote up a document in the latest version of Word, Word 2007, (which I'm mostly impressed with, though it's no LaTeX) and sent it off as an attachment. I quickly got back the dreaded Sorry, can't open this attachment message.

D'oh. The latest version of MS Word wants to save things as .docx instead of .doc. And of course, older versions of Word are clueless about this format.

The easy thing to do would have been to re-save the document as .doc and resend it. But, I wasn't at my computer at the time. So, I poked around via Google, and found docx-converter.com. The site did a fine job of extracting my short document as text, so I could resend it.

I'm sure there are better options than docx-converter, as it just extracts text. But, in a pinch, it worked. I also tried the compatability pack for my older version of MS Word, but that didn't seem to take. The .docx file opened as a bunch of gibberish.

Just when you thought we were done with these format games...here we go again.

Other sites which look promising for document conversion are:


  1. Anonymous10:31 AM

    This is why I prefer to save everything as .txt. And why I still use pine to read my email. And why my Beastie Boys tattoo doesn't look nearly as cool now as it did 10 years ago.

    Wait...that last one doesn't belong on the "why I don't like proprietary formats" list. It belongs on the "why my chest hates me" list. Wife must've been messing with my index cards again...

  2. I guess file formats is yet another advantage I get from reading my e-mail in emacs.

    That's good to know about the index cards, as my new system could seriously get hosed if this kind of mix-up happened ;-).

  3. Anonymous11:56 AM

    This just inspired me to make up a new nickname for myself.
    I will heretofore be known as:
    Ben "Ifitaintasciiitaintworthreading" Simon.

    I love it. It makes me sound all rebellious and ornery.