I recently had the need to SELECT some data from MySQL tables and include a “rownum” column on each row. For example, I needed some results that looked like this:
Notice that I cannot use the “id” column because it is auto-generated, and the data is not inserted into the table in the order I am SELECTing it. This type of thing is very easy to achieve in Oracle, which has a built-in “rownum” column that can be added to the SELECT statement. MySQL, unfortunately, does not have a similar construct. So what are we to do?
My first attempt to solve this problem came from a couple of sources like this one: http://jimlife.wordpress.com/2008/09/09/displaying-row-number-rownum-in-mysql/.
This solution is quite brilliant. It uses a user defined variable that is incremented each time a row is processed. My query would look something like this:
SELECT id, @rownum:=@rownum+1 AS rownum, name FROM users u, (SELECT @rownum:=0) r ORDER BY name
This works quite well, but it has a couple of drawbacks. The first is manageable: user defined variables are connection-scoped, so you’ll need to be careful not to overwrite any other variables or reuse this one by accident in another query. Unfortunately, the second drawback is a doozy, and it is documented on the MySQL website. Here is the excerpt of interest:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement. In SELECT @a, @a:=@a+1, …, you might think that MySQL will evaluate @a first and then do an assignment second. However, changing the statement (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may cause MySQL to select an execution plan with a different order of evaluation.
What does that mean? Well, it means you can’t assign a variable and SELECT it in the same statement or things might get out of order. This is because the variable is assigned in the order that rows are processed, which can change depending on how MySQL decides to optimize and process the query. The more complicated your query, the more likely things are to get fuzzy. In my not-so-simple case, I was doing multiple JOINs and an ORDER BY. Everything worked fine on our testing server. I ran the query multiple times, on different days, using the same data as our production server, and even on our production server itself. It worked great in all cases, so we deployed it to production a few days later. Then our website exploded. Nothing was in order. I tracked it down to this query, and more specifically, to this @rownum solution.
Between the time the query was tested and the time we deployed it, the size of the table in question grew two- or three-fold. This changed MySQL’s execution plan and the order that it processed the rows. My data ended up looking something like this:
In other words, my data was still in the order I wanted, but the row numbers were completely off!
…but don’t fret! Would I be writing this blog post if I didn’t have a solution? Of course not!
The solution to this problem is simple. We just need to shield the assignment of our @rownum variable from the MySQL execution plan. To accomplish that, we move the main query into a subquery like so:
SELECT u.id, @rownum:=@rownum+1 AS rownum, u.name FROM ( SELECT id, name FROM users ORDER BY name ) u, (SELECT @rownum:=0) r
Now we’re back to the data we originally expected:
I hope this saves you all the frustration it caused me.