In the first part of the article series, we understood what indexes are and how indexes work internally. In this part, we will look into multi-columnar indexes, ordering in indexes, and pitfalls in using indexes.

We know that MySQL will only use one index per table per query. Therefore, if we have to filter on multiple fields, having indexes on them separately won't help. For cases like this, a multi-column index comes into the picture,  for a multi-column index order of columns in the index is very important. For a multi-column index.

index(column1, column2) !=  index(column2, column1)

The rule of thumb would be to have the 1st column in the index with more cardinality, than the 2nd column and so on. Consider a case, where we are storing author_id, user_id, and amount in a table.  For our example. we have inserted almost ~3 million rows in the table. In our example, the cardinality of user_id is higher than author_id. Let's see the proper way to put an index here.

Purchase table

At this point, we don't have an index on user_id or author_id, lets try to find out the total amount of books sold for an author.

mysql> select sum(amount) from purchase where author_id = 1500;
+-------------+
| sum(amount) |
+-------------+
|     1851958 |
+-------------+
1 row in set (0.86 sec)


mysql> explain select sum(amount) from purchase where author_id=1500 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: purchase
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3018896
     filtered: 10.00
        Extra: Using where

1 row in set, 1 warning (0.00 sec)

We can see, that the query is taking 860 milliseconds, also, the explain output we can see, that type is ALL which basically denotes, full-table scan. Now, this is expected, as we have not indexed anything. Let's go ahead and put an index on author_id.

Adding an index on author_id

Now, that we have an index on author_id, lets see how does the query fare now.

mysql> select sum(amount) from purchase where author_id = 1500;
+-------------+
| sum(amount) |
+-------------+
|     2076643 |
+-------------+
1 row in set (0.02 sec)

mysql> explain select sum(amount) from purchase where author_id=1500 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: purchase
   partitions: NULL
         type: ref
possible_keys: author_id_index
          key: author_id_index
      key_len: 4
          ref: const
         rows: 864
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.01 sec)

With the index in place, we see that the total query time came down to 20 milliseconds from 860 milliseconds. Inexplain we see that, author_id_index is used, which has caused this optimization. Now, while 20 milliseconds is good, in order to compute the sum of the amount, the database, still has to go back to the table since  amount is not part of the index, let's see if we add amount in the index, what happens next.

Adding index on author_id and amount

We have added a multi-columnar index author_id_and_amount_index , let's see, if adding it made the above query any better.

mysql> select sum(amount) from purchase where author_id = 1599;
+-------------+
| sum(amount) |
+-------------+
|     2269039 |
+-------------+
1 row in set (0.00 sec)


mysql> explain select sum(amount) from purchase where author_id=1599\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: purchase
   partitions: NULL
         type: ref
possible_keys: author_id_index,author_id_and_amount_index
          key: author_id_and_amount_index
      key_len: 4
          ref: const
         rows: 892
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

As it's showing 0.00 seconds, the total time taken would be less than 10 milliseconds. So we have come from ~850 milliseconds to less than 10 milliseconds by applying good indexes. There are a couple of points that need to be highlighted here:

  1. At this point author_id index is redundant and can be deleted safely, as author_id_and_amount_index have the same impact, because the order of author_id in  author_id_and_amount_index.
  2. Now, what would happen if we reverse the order of author_id and amount in the  author_id_and_amount_index? Let's actually try that, lets delete the already existing index and create another one with reversed order.
A new index with reversed order.

Let's fire the same query and see, what happens.

mysql> select sum(amount) from purchase where author_id = 1517;
+-------------+
| sum(amount) |
+-------------+
|     2275132 |
+-------------+
1 row in set (0.73 sec)

mysql> explain select sum(amount) from purchase where author_id = 1517\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: purchase
   partitions: NULL
         type: index
possible_keys: NULL
          key: amount_and_author_id_index
      key_len: 9
          ref: NULL
         rows: 3092857
     filtered: 10.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

Again the query performance got worse, it took 730 milliseconds, also in explain we can see that even when amount_and_author_id_index is used the query performance is not great. This is because of how the index is stored in the database. Let's take a look at how the data is structured in both the index.

amount_and_author_id_index
author_id_and_amount_index

If we see, how both indexes store the data, it is quite clear why the query performed better with author_id_and_amount_index, with this index, the author_id is selected in O(logN) time as author_id is sorted, and then all the amounts are summed in that range. While with amount_and_author_id_index , has to scan all rows in order to find the author_id and then sum those rows, therefore, such long time.

Another important point would be that in a multi-columnar index, columns can not be skipped while querying ie. for an index Index(column_a, column_b, column_c), firing a query on columns_b and column_c alone would not yield any performance benefit. The following query will be using the above index.

  • Query using column_a, column_b, and column_c.
  • Query using column_a and column_b.
  • Quer using column_a and column_c.

Summary:

  1. Order matters in a multi-columnar index. As a rule of thumb, keep the first one on which query is fired. If the query is fired on all the columns equally, keep the one high highest cardinality first and so on and so forth.
  2. In a multi-columnar index, columns can not be skipped while querying.

MySQL :: MySQL 8.0 Reference Manual :: 8.3.6 Multiple-Column Indexes
High Performance MySQL
Chapter 4. Indexes Indexes allow MySQL to quickly find and retrieve a set of records from the millions or even billions that a table may contain. If you’ve been using … - Selection from High Performance MySQL [Book]
MySQL :: MySQL 8.0 Reference Manual :: 8.3.1 How MySQL Uses Indexes