Tuesday, February 15, 2011

Gotcha of the Day: SimpleDB Fails to Deliver the Goods

I have to admit, I was psyched about Amazon's SimpleDB. There was so much to love:

  • You get High Availability for free
  • Data is always indexed and ready for fast querying
  • Amazon would would take care of all the infrastructure worries
  • Interesting data scaling options are available: like horizontal scaling and parallelizing queries
  • The whole thing runs over HTTP so it's accessible anywhere and everywhere
  • In general, the system is drop dead simple - making it an ideal platform to build clever solutions on top of

And you know, all of the above may in fact be true. But, there's one gotcha that I ran into that was a show stopper. One word says it all: Latency.

I found that a basic PutAttributes or GetAttributes call would normally take between 30 and 100 milliseconds. And as I added more servers to the mix, the latency got even worse.

And yes, the client that's making requests of SimpleDB is located on an EC2 box, theoretically within Amazon's network.

Perhaps I shouldn't be surprised by the poor performance. But still, SimpleDB is just so dang simple - shouldn't quick response time, even over HTTP, be possible?

The test case I used was implementing HTTP session storage, which was nice and easy to implement. But, it was also nice and easy to implement in MySQL, and ran quite a bit faster (13 millis instead of the variable 30-100 millis).

I'd love to say I was doing something wrong. But, the time I was measuring was in the curl_exec(...) call, something I don't have a whole lot of control over.

OK, It's official, my new database to oggle is CouchDB. Oh, and I can't help keeping an eye on SimpleDB otoo, as maybe they'll improve it's performance.

Anybody have any experience that differs from this? Did I have some coding gotcha which gummed up the works?

4 comments:

  1. You may also be interested in MongoDB if you're looking at document-oriented systems. I haven't used it personally, but I did listen to a podcast about it awhile back and it sounded to me like it was worth investigating at the time.

    ReplyDelete
  2. Thanks Joseph - I've heard about MongoDB. I'm not quite sure what caused me to gravitate more so to CouchDB (maybe it's my love of all things Erlang?).

    Regardless, I'll keep it in mind.

    -Ben

    ReplyDelete
  3. The Other Ben Simon12:03 PM

    My experience with MongoDB is a mixed bag. Unknown driver issues from the community supported driver crept up during a DB upgrade and had to fall back. Latency as you mentioned over the network for some people will be an issue.

    But my main concern (aside from the technology feeling like it is NOT production ready...yes, I'm ready for the flames) is people trying to use it just because it saves them development effort (no sql maps etc) or because they want to scale it out in some mythical future when they are really really successful. But they don't stop and consider "wait a minute...my data IS relational!"

    So as a "smart" file storage (with metadata built in) or a big bucket to dump logging, I get it. Even that may be an issue if/when you really need to start mining your logs to look for patterns...then you find the query syntax to be a lot less expressive (or at least less efficient) versus SQL.

    Another case of right tool for the right job. Hmmm...just thought of a new phrase, "Right tool sweet, right cool feet." Okay, it needs work...

    ReplyDelete
  4. Ben - so well explained. Thanks.

    ReplyDelete