Imagine one day your Kafka infrastructure suddenly grinds to a halt. Your unsatisfied customers experience some data loss. Your enterprise is trying to restore the business processes after this disruption to avoid unacceptable consequences. However, now the damage has been done & your business continuity has gone for a toss too!
Now coming back from this nightmare to reality, no business would want the Kafka Apocalypse.
Datacenter downtime & data loss can severely impact your business revenue or even halt the operations entirely. To avoid such situations turning into a disaster, enterprises should have business continuity plans & disaster recovery strategies in place already.
A disaster recovery plan should include deploying Apache Kafka wherein data centers are scattered geographically. So if disaster strikes or any other mishap occurs that leads to a complete data center failure, we should have at least multi-cluster Kafka deployments with replication mechanisms in place. It would ensure data availability & facilitate running production environments from the backup clusters until operations get restored.
And that’s how Kafka MirrorMaker 2 emerges into the game. It replicates topics from one cluster to another in real-time using the Kafka connect framework internally. A Kafka cluster encompasses auto load balancing & leader election configurations to replicate individual topics securely among brokers & keep data intact & available in case of any internal node damage.
Let’s dig deeper into Kafka MirrorMaker 2 features, various types of configurations, and some best practices for optimal results.
Apache Kafka is a highly resilient & fault-tolerant open-source distributed message streaming system that utilizes message-based topics for communication between producers and consumers. |
MirrorMaker replicates Kafka topics, their configurations, ACLs, security groups, and consumer offset configurations from one or more source Kafka clusters to one or more backup Kafka clusters.
Here is the MirrorMaker 2 Architecture-
The different directional and configurational flows from primary to target Kafka clusters are called replication flows.
Kafka MirrorMaker replication configuration format-
source_cluster->target_cluster
Kafka Admins can also create complicated replication topologies based on these flows.
Below are the five types of configurations of MirrorMaker 2, discussed in detail -
In the active/active setup, two clusters, primary & secondary, both servers are present and used simultaneously as production clusters. For example, let us consider two clusters - one present in North India and another in South India. So these respective clusters would primarily be responsible for low latency in their regions, while the backup gets replicated to each other in this kind of setup.
In an active/passive setup, there are two clusters present - primary & backup. The primary one is responsible for serving consumers in production, while backup acts as an idle replicator, which sometimes gets used for performing analytics for business insights.
In aggregation mode, we have multiple Kafka source clusters from where the data gets replicated to a single target cluster. This setup is preferred because the source clusters are highly available, and the backup data is less sensitive to not having a dedicated one for each source present.
Fan-out mode is applied when the data from the source Kafka clusters need to be highly available all the time, and we cannot afford a failure of even multiple data centers at a time. Hence, the source Kafka cluster is replicated to at least more than one target cluster.
In a forwarding type of setup, multiple clusters are replicated sequentially by the MirrorMaker framework. This setup comes in handy when the data is particularly fragile and requires very high fault tolerance.
To minimize Kafka producer lag, keep the source Kafka cluster & the MirrorMaker processes as close as possible to the target Kafka clusters. It is because Kafka producers get more impacted when high network latency connections are present.
primary --------- MirrorMaker --> secondary (remote) (local) |
The best practice is to consume from remote and produce to local setup. It means we should have the MirrorMaker services close to the target clusters and specify the --clusters in the MirrorMaker connect configuration.
$ ./bin/connect-mirror-maker.sh connect-mirror-maker.properties --clusters secondary |
The clusters specified under the --clusters tell the connect framework that these are nearby, and thus, the data is sent only to them rather than sending it to remote ones.
This blog gave a brief overview of the MirrorMaker 2 replicator framework along with monitoring Kafka clusters in a multi-cluster environment.
Apache Kafka is a sturdy part of any IT stack & helps organizations across the globe manage their data efficiently and eliminate riddled downtime & failures.
If you want to learn more about replication in Kafka with MirrorMaker 2, you should register for our upcoming virtual session on Kafka on June 30, 2021. You can expect a demo of its use cases, architecture, and implementation as well. Register Now!