In this article series, we will dig deep into what indexes are, how it works, what are the advantages and disadvantages of using an index, multi-column indexes, and what are the pitfalls or gotchas of using an index.

We have often come across the example of the index in a book as a database index. Basically, that is a good starting point and gets you through interview questions, but let's go a bit deeper and see what actually happens behind the scenes. Indexes basically are data structures that make reads faster. Most of the indexes are implementation of B+ Trees. Although it can of Hash data-structure as well. Both have their pros and cons. In this article, we will see how B+ Tree indexes work.

But, why do we need an index in the first place? Let's consider an Employee table, with just 2 columns, 1 for the primary key and the other for the name. Now if you search for an employee, it would scan the whole table and that's an issue. For a table containing a million entries, the database would have to scan all million rows and that's not scalable. Not only the query will take time, IO and CPU will also be wasted in this scan operation.

Without Index, DB has to scan all rows in order to find `Ryan
explain showing type as ALL

If we look into the types column for the explain command, we see ALL, which means full-table scan. Therefore without an index, the complexity of search or read query is O(n). Note that the rows column is not the total number of rows returned from the query, but the total number of rows the database would have to scan in order to run the query.  Let's create an index on the name column and see how the query fares again.

Adding Index on the name column.
Index name being used as mentioned in key field.

With the index created, we can see that now the type has changed to ref from ALL, also, if you check the Extra column, we see Using index which means the entire data was served from the index and the table was not referenced. Now, if you remember, we have an index only on the name column, and since we are only reading id in the query, then how come only the index can return the data? Well, this is because the index also has the corresponding primary key along with the value. Let's see how the index actually looks.

Index Visualisation

All the leaf nodes are on the same level, also leaf nodes are connected and act as a double-linked list so that traversal can happen in both directions if needed. Also, each node apart from keeping the value and primary key also keeps a pointer to the actual row in the table to access any other field. With b-trees since leaf nodes are sorted, lookup only takes O(log n) times. And with this logarithmic complexity, even for a million rows database would have to do ~20 lookups to find the data.

We have established, that indexing a column speeds up the reads, then should we index all the columns? Absolutely Not !!! Just like everything, indexes are also a trade-off between reads and writes. More column you index, more index needs to be updated on writes, slowing the overall writes. Therefore, only index columns on which are queried frequently. If you think, you need to index every column, maybe MySQL is not the correct choice, to begin with.

What else we should keep in mind, while indexing, another point would be how many distinct rows are present with some value, If a boolean column type is indexed, and if the distribution of values is also around 50-50%, ie. almost 50% true and other 50% false, putting an index won't help much, as in any case, half of the table will be indexed. The is also known as the cardinality of the column, the higher the cardinality, the better the index will perform.

Summary:

  1. Indexes are data-structure, which makes reads faster.
  2. MySQL uses only one index per table per query except for UNION.
  3. Do not index on all the columns as it will add overhead at writes. We should index only those columns which are queried/read frequently. Index a column if order by is performed on that column.
  4. Columns with high cardinality perform better when indexed.
  5. Use explain to get the query planner, understand all the fields in the explain output. Important things to notice would be type and rows.

In the next part of the article, we will understand how ordering matters in multi-column indexes. We will also see some common pitfalls of using an index.


MySQL :: MySQL 8.0 Reference Manual :: 8.3.1 How MySQL Uses Indexes
High Performance MySQL
Chapter 4. Indexes Indexes allow MySQL to quickly find and retrieve a set of records from the millions or even billions that a table may contain. If you’ve been using … - Selection from High Performance MySQL [Book]
MySQL :: MySQL 8.0 Reference Manual :: 8.8.2 EXPLAIN Output Format
How DataBase Indexes Works Internally
Database Indexes are data structures that make reads faster. We will see how the multi-columnar index works & why ordering matters with examples.