In the first part of the article series, we understood what indexes are and how indexes work internally. In this part, we will look into multi-columnar indexes, ordering in indexes, and pitfalls in using indexes.
We know that MySQL will only use one index per table per query. Therefore, if we have to filter on multiple fields, having indexes on them separately won't help. For cases like this, a multi-column index comes into the picture, for a multi-column index order of columns in the index is very important. For a multi-column index.
index(column1, column2) != index(column2, column1)
The rule of thumb would be to have the 1st column in the index with more cardinality, than the 2nd column and so on. Consider a case, where we are storing author_id, user_id, and amount in a table. For our example. we have inserted almost ~3 million rows in the table. In our example, the cardinality of user_id is higher than author_id. Let's see the proper way to put an index here.

At this point, we don't have an index on user_id or author_id, lets try to find out the total amount of books sold for an author.
mysql> select sum(amount) from purchase where author_id = 1500;
+-------------+
| sum(amount) |
+-------------+
| 1851958 |
+-------------+
1 row in set (0.86 sec)
mysql> explain select sum(amount) from purchase where author_id=1500 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: purchase
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3018896
filtered: 10.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
We can see, that the query is taking 860 milliseconds, also, the explain
output we can see, that type
is ALL
which basically denotes, full-table scan. Now, this is expected, as we have not indexed anything. Let's go ahead and put an index on author_id.

Now, that we have an index on author_id, lets see how does the query fare now.
mysql> select sum(amount) from purchase where author_id = 1500;
+-------------+
| sum(amount) |
+-------------+
| 2076643 |
+-------------+
1 row in set (0.02 sec)
mysql> explain select sum(amount) from purchase where author_id=1500 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: purchase
partitions: NULL
type: ref
possible_keys: author_id_index
key: author_id_index
key_len: 4
ref: const
rows: 864
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.01 sec)
With the index in place, we see that the total query time came down to 20 milliseconds from 860 milliseconds. Inexplain
we see that, author_id_index
is used, which has caused this optimization. Now, while 20 milliseconds is good, in order to compute the sum of the amount, the database, still has to go back to the table since amount
is not part of the index, let's see if we add amount
in the index, what happens next.

We have added a multi-columnar index author_id_and_amount_index
, let's see, if adding it made the above query any better.
mysql> select sum(amount) from purchase where author_id = 1599;
+-------------+
| sum(amount) |
+-------------+
| 2269039 |
+-------------+
1 row in set (0.00 sec)
mysql> explain select sum(amount) from purchase where author_id=1599\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: purchase
partitions: NULL
type: ref
possible_keys: author_id_index,author_id_and_amount_index
key: author_id_and_amount_index
key_len: 4
ref: const
rows: 892
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)
As it's showing 0.00 seconds, the total time taken would be less than 10 milliseconds. So we have come from ~850 milliseconds to less than 10 milliseconds by applying good indexes. There are a couple of points that need to be highlighted here:
- At this point
author_id
index is redundant and can be deleted safely, asauthor_id_and_amount_index
have the same impact, because the order ofauthor_id
inauthor_id_and_amount_index
. - Now, what would happen if we reverse the order of
author_id
andamount
in theauthor_id_and_amount_index
? Let's actually try that, lets delete the already existing index and create another one with reversed order.

Let's fire the same query and see, what happens.
mysql> select sum(amount) from purchase where author_id = 1517;
+-------------+
| sum(amount) |
+-------------+
| 2275132 |
+-------------+
1 row in set (0.73 sec)
mysql> explain select sum(amount) from purchase where author_id = 1517\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: purchase
partitions: NULL
type: index
possible_keys: NULL
key: amount_and_author_id_index
key_len: 9
ref: NULL
rows: 3092857
filtered: 10.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)
Again the query performance got worse, it took 730 milliseconds, also in explain we can see that even when amount_and_author_id_index
is used the query performance is not great. This is because of how the index is stored in the database. Let's take a look at how the data is structured in both the index.


If we see, how both indexes store the data, it is quite clear why the query performed better with author_id_and_amount_index
, with this index, the author_id
is selected in O(logN) time as author_id is sorted, and then all the amounts are summed in that range. While with amount_and_author_id_index
, has to scan all rows in order to find the author_id and then sum those rows, therefore, such long time.
Another important point would be that in a multi-columnar index, columns can not be skipped while querying ie. for an index Index(column_a, column_b, column_c)
, firing a query on columns_b and column_c alone would not yield any performance benefit. The following query will be using the above index.
- Query using column_a, column_b, and column_c.
- Query using column_a and column_b.
- Quer using column_a and column_c.
Summary:
- Order matters in a multi-columnar index. As a rule of thumb, keep the first one on which query is fired. If the query is fired on all the columns equally, keep the one high highest cardinality first and so on and so forth.
- In a multi-columnar index, columns can not be skipped while querying.
