In this article series, we will dig deep into what indexes are, how it works, what are the advantages and disadvantages of using an index, multi-column indexes, and what are the pitfalls or gotchas of using an index.
We have often come across the example of the index in a book as a database index. Basically, that is a good starting point and gets you through interview questions, but let's go a bit deeper and see what actually happens behind the scenes. Indexes basically are data structures that make reads faster. Most of the indexes are implementation of B+ Trees. Although it can of Hash data-structure as well. Both have their pros and cons. In this article, we will see how B+ Tree indexes work.
But, why do we need an index in the first place? Let's consider an
Employee table, with just 2 columns, 1 for the primary key and the other for the name. Now if you search for an employee, it would scan the whole table and that's an issue. For a table containing a million entries, the database would have to scan all million rows and that's not scalable. Not only the query will take time, IO and CPU will also be wasted in this scan operation.
If we look into the
types column for the
explain command, we see
ALL, which means full-table scan. Therefore without an index, the complexity of search or read query is O(n). Note that the
rows column is not the total number of rows returned from the query, but the total number of rows the database would have to scan in order to run the query. Let's create an index on the
name column and see how the query fares again.
With the index created, we can see that now the
type has changed to
ALL, also, if you check the
Extra column, we see
Using index which means the entire data was served from the index and the table was not referenced. Now, if you remember, we have an index only on the
name column, and since we are only reading
id in the query, then how come only the index can return the data? Well, this is because the index also has the corresponding primary key along with the value. Let's see how the index actually looks.
All the leaf nodes are on the same level, also leaf nodes are connected and act as a double-linked list so that traversal can happen in both directions if needed. Also, each node apart from keeping the value and primary key also keeps a pointer to the actual row in the table to access any other field. With b-trees since leaf nodes are sorted, lookup only takes O(log n) times. And with this logarithmic complexity, even for a million rows database would have to do ~20 lookups to find the data.
We have established, that indexing a column speeds up the reads, then should we index all the columns? Absolutely Not !!! Just like everything, indexes are also a trade-off between reads and writes. More column you index, more index needs to be updated on writes, slowing the overall writes. Therefore, only index columns on which are queried frequently. If you think, you need to index every column, maybe MySQL is not the correct choice, to begin with.
What else we should keep in mind, while indexing, another point would be how many distinct rows are present with some value, If a boolean column type is indexed, and if the distribution of values is also around 50-50%, ie. almost 50% true and other 50% false, putting an index won't help much, as in any case, half of the table will be indexed. The is also known as the cardinality of the column, the higher the cardinality, the better the index will perform.
- Indexes are data-structure, which makes reads faster.
- MySQL uses only one index per table per query except for UNION.
- Do not index on all the columns as it will add overhead at writes. We should index only those columns which are queried/read frequently. Index a column if
order byis performed on that column.
- Columns with high cardinality perform better when indexed.
explainto get the query planner, understand all the fields in the
explainoutput. Important things to notice would be
In the next part of the article, we will understand how ordering matters in multi-column indexes. We will also see some common pitfalls of using an index.