Thursday, March 03, 2011

Gotcha of the Day: MySQL indexing gotchas

So I'm debugging some MySQL queries that shouldn't have been that slow. But were. They had a shape kinda like the following:

 SELECT u.*
  FROM users u
  WHERE
    u.active                        AND
    find_in_set('trial', u.props)
  ORDER BY u.created

One of things that I like about SimpleDB is that it simply won't allow you to put data into it that it can't cleanly index. Same goes for the database behind the Google App engine. MySQL, of course, isn't so picky.

After much poking around, I realized that the above query has at least 3 potential pitfalls:

  • u.active - assuming that active is an INT field and that a value of 1 means that the record is active, this clause makes complete logical sense. However, MySQL will refuse to use it as an index. Instead, you need to say:
    u.active = 1
    
    I could guess why this is, and I could complain that u.active = 1 should be the exact same thing as u.active alone. But, I won't bother.
  • find_in_set('trial', u.props) - assuming that u.props is a SET data type, MySQL won't use this value as an index. SETs are actually a slick little feature in MySQL - they are perfect for tracking little bits of information in a single column. But, because of the way they are implemented, there's no way to use them as an index. Instead, you'd need to create a dedicated field named u.is_trial and specifically set and query against that.

    The SimpleDB philosophy would say that MySQL shouldn't offer SET as a data type because you can't efficiently query by it. And the MySQL philosophy would say, hey, use it or not, it's up to the programmer. As I work on systems that have been around for a while, I'm starting to see the wisdom of the SimpleDB approach. Sooner or later, that SET column is going to be something you want to query. And then it's time to go into refactoring mode. Seems like skipping SETs from the start may be the way to go.

    This is actually the approach I've taken with views. I love MySQL views. They are to SQL what functions are to PHP - a great mechanism for building abstractions. And yet, they completely fail to use indexes, which means that eventually your data will get so large that your views will be essentially worthless. Better to skip them from the start. How sad.
  • ORDER BY u.created - if you're going to sort on a column, you better have an index backing it up so the sort can happen efficiently. I often times find myself forgetting to check this column for an index.

Should MySQL be so permissive about letting you use constructs that can't be efficiently queried (or indexed)? What do you think? As you can see, I'm on the fence and leaning towards no.

No comments:

Post a Comment