In this article, we will understand what replication is, the traditional and the new way of replication, the different components involved in replication, and other important aspects of replication.
MySQL replication is a process of replication of data from one master/source MySQL database server to one or more MySQL slave/replica servers.
First, let's look into the traditional style of replication. This replication process involves 4 major steps, 2 on the source end and 2 on the replica's side.
- Whenever the master's database is modified, the change is written to a file. This file is called the binary log or popularly binlog. Binlog is written by the same client thread, which executed the query. The binary log serves as a written record of all events that modify database structure (DDL) or content (data) (DML) from the moment the server was started.
- The master has a thread, called the dump thread, that continuously reads the master's binlog and sends it to the replicas. Dump threads are created for each replica listening to the changes. For example, if in 1 second 100 writes happen on master and there are 4 replicas connected, then 400 dump threads will be created. Each replica that connects to the source requests a copy of the binary log. Note that replicas pull the data from the source, rather than the source pushing the data to the replica.
- On the replica side, it has a thread called IO thread that receives the binlog changes that are sent by the master's dump thread and writes it to a file: called the relay log.
- The replica has another thread, called the SQL thread, that continuously reads the relay log and applies the changes to the replica server.
Asynchronous, Semi-synchronous and Fully Synchronous Replication:
By default the process of replication is asynchronous, i.e once the binlog is written on the master, the master does not know whether or when a replica has retrieved and processed the transactions, and there is no guarantee that any event ever reaches any replica. This obviously is faster, but consistency is not guaranteed between source and replicas, and in case of failover, if any replica is promoted, then it may not have all the writes.
In semi-synchronous replication, the source/master waits until at least one replica has received the event and then commits the transaction. Please note that the source does not wait for the events to be fully executed and committed on the replica side. Semi synchronous replication, therefore, guarantees that if the source crashes, all the transactions that it has committed have been transmitted to at least one replica. To read more on semi-synchronous replication, please read. https://dev.mysql.com/doc/refman/8.0/en/replication-semisync.html
In synchronous replication, the source will be blocked until all the replicas have not only received the event but also committed the transaction. Fully synchronous replication means failover from the source to any replica is possible at any time. The drawback of fully synchronous replication is that there might be a lot of delays completing a transaction on the master/source end.
Row-Based Replication(RBR) Vs. Statement-Based Replication(SBR):
Row-based replication format as the name suggests replicates each and every row changes from source to replica and is deterministic in nature. To take an example, if an update query on the source has modified 10000 rows, then all 10000 rows will be written on binlog and consecutively applied on the replica. Row-based replication is the default method due to its high data integrity and deterministic nature.
In the Statement-Based Replication format, only the query is replicated and not the rows which got modified on the source. With the same example, if an update query modified 10k rows, then only 1 query will be written to binlog and sent over to replica.
Both formats have pros and cons, please read more about it in the below article. However, the mixed replication format which uses both SBR and RBR should provide the best combination of both data integrity and performance.
Replication enables data from one MySQL database source server to be copied to one or more MySQL database replica servers. Here are a few pointers to remember:
- Binlog is the file that is written on the source server by the client thread. Dump thread on the source server sends the data over to the replica. On the replica side, the Relay log is written by the IO thread, then another SQL thread applies these logs on the replica server.
- Replicas connect to the source server and start pulling the data. The source does not push the data to replicas.
- Replication is asynchronous by default. Although both and semi-synchronous and fully synchronous replication is permitted. This is a trade-off between performance and data integrity. Asynchronous replication is the fastest and fully synchronous replication guarantees the highest data integrity. Developers should choose the strategy based on the requirement.
MIXEDreplication format should provide the best combination of data integrity and performance.
In the next blog post, we will set up replication locally, will explore the different components involved, and will understand what causes replication lag.