Thursday, November 29, 2007

MySQL Optimization: A Bit Of Kvetching

Call me crazy, but suppose I have the following

  explain SELECT <stuff> FROM <stuff> where name = 'foo';
  -- Explain says:
  +----+-------------+-------+-------+-------+
  | id | select_type | table |  rows | Extra |
  +----+-------------+-------+-------+-------+
  |  1 | SIMPLE      | at    |     1 |       | 
  |  1 | SIMPLE      | us    |     1 |       | 
  |  1 | SIMPLE      | ps    |     1 |       | 
  +----+-------------+-------+-------+-------+
  3 rows in set (0.04 sec)


And:

  explain SELECT <stuff> FROM <stuff> where id  = 'foo';
  -- explain says the same thing as above, but for ID

And those queries above UNION'ed scan, say, 20 rows 1 row:

  +----+--------------+-----------+------+-------------+
  | id | select_type  | table     | rows | Extra       |
  +----+--------------+-----------+------+-------------+
  |  1 | PRIMARY      | at        |    1 |             |
  |  1 | PRIMARY      | us        |    1 |             |
  |  1 | PRIMARY      | ps        |    1 |             |
  |  2 | UNION        | at        |    1 | Using where |
  |  2 | UNION        | us        |    1 |             |
  |  2 | UNION        | ps        |    1 |             |
  | NULL | UNION RESULT |  union1,| NULL |             |
  +----+--------------+-----------+------+-------------+
  7 rows in set (0.05 sec)
why should:

  explain SELECT <stuff> FROM <stuff> 
     where id  = 'foo' OR name = 'foo'
  -- Explain says:
  +----+-------------+--------+-------+-------------+
  | id | select_type | table  | rows  | Extra       |
  +----+-------------+--------+-------+-------------+
  |  1 | SIMPLE      | at     | 55951 | Using where | 
  |  1 | SIMPLE      | us     |     1 |             | 
  |  1 | SIMPLE      | ps     |     1 |             | 
  +----+-------------+--------+-------+-------------+
  3 rows in set (1.77 sec)

scan 55,000 rows?! Your telling me that the MySQL query optimizer can't find a better strategy than a table scan?! Googling around I can't find any work around other than manually optimizing the queries.

A few lessons learned from this:

Always test your queries, and never write them off as *needing* to be slow. Yeah, you've got 55,000 rows of data - big deal, you should need to only scan 20 rows to find what you want.

All database engines are not the same. Now, I'm going out on a limb here because I haven't tested this case in PostgreSQL, but I'm fairly certain it would optimize this query properly. The reality is that MySQL and Postgres (and Sybase, and Oracle, etc.) have different levels of sophistication. Heck, the versions of MySQL I often encounter doesn't even have transactions turned on (or, more likely, I don't know how to turn them on). If you're writing a database centric app (and who isn't these days?) shouldn't you be using the most powerful one available, not just the most popular one?

Maybe I'm not being fair to MySQL here - I think many of the installs I deal with are probably an older version. But still, I'm amazed at how many people I meet who think that any old version of MySQL is just as powerful as everything else out there. Not so.

Take nothing for granted and index like hell.

--Ben

Update: Per Dave's suggestion, I've put in actual (though scrubbed, and truncated) results of the explain query. See, I'm not making this up.

2 comments:

  1. "And those queries above UNION'ed scan, say, 20 rows, why should: [...] scan 55,000 rows?!"

    Ben, did you actually TRY running explain on those two queries UNIONed together, or were you just speaking set-logic-ese?

    If you didn't literally run a UNION of those queries, would you mind doing that?

    I could do it myself, but if you've still got your 55000-row table laying around in the same mysql instance, it would be an easier and more useful comparison.

    That's pretty messed up, though, that the OR condition caused a full table scan. Was a primary key defined?

    ReplyDelete
  2. I added in the results of explain above.

    The union does great. It's just that tricky OR that kills it. Pesky boolean operator.

    There is a primary ID on the table, and the name is also defined with an index.

    ReplyDelete