Resilient Kafka Cluster Management Using MirrorMaker 2

Written by Sairam Elangovan | Jun 25, 2021 11:11:30 AM

Imagine one day your Kafka infrastructure suddenly grinds to a halt. Your unsatisfied customers experience some data loss. Your enterprise is trying to restore the business processes after this disruption to avoid unacceptable consequences. However, now the damage has been done & your business continuity has gone for a toss too!

Now coming back from this nightmare to reality, no business would want the Kafka Apocalypse.

Datacenter downtime & data loss can severely impact your business revenue or even halt the operations entirely. To avoid such situations turning into a disaster, enterprises should have business continuity plans & disaster recovery strategies in place already.

A disaster recovery plan should include deploying Apache Kafka wherein data centers are scattered geographically. So if disaster strikes or any other mishap occurs that leads to a complete data center failure, we should have at least multi-cluster Kafka deployments with replication mechanisms in place. It would ensure data availability & facilitate running production environments from the backup clusters until operations get restored.

And that’s how Kafka MirrorMaker 2 emerges into the game. It replicates topics from one cluster to another in real-time using the Kafka connect framework internally. A Kafka cluster encompasses auto load balancing & leader election configurations to replicate individual topics securely among brokers & keep data intact & available in case of any internal node damage.

Let’s dig deeper into Kafka MirrorMaker 2 features, various types of configurations, and some best practices for optimal results.

Apache Kafka is a highly resilient & fault-tolerant open-source distributed message streaming system that utilizes message-based topics for communication between producers and consumers.

MirrorMaker 2 & Its Features

MirrorMaker replicates Kafka topics, their configurations, ACLs, security groups, and consumer offset configurations from one or more source Kafka clusters to one or more backup Kafka clusters.

Here is the MirrorMaker 2 Architecture-

Features of MirrorMaker 2-

Topic replication along with configurations
ACLs are in sync
Active-Active clustering setup is available for maximizing cluster performance and latency
Emits offsets to migrate consumers between clusters during disaster recovery

The different directional and configurational flows from primary to target Kafka clusters are called replication flows.

Kafka MirrorMaker replication configuration format-

source_cluster->target_cluster

Kafka Admins can also create complicated replication topologies based on these flows.

MirrorMaker 2 Configurations

Below are the five types of configurations of MirrorMaker 2, discussed in detail -

Active/Active high availability deployments - A->B, B->A

In the active/active setup, two clusters, primary & secondary, both servers are present and used simultaneously as production clusters. For example, let us consider two clusters - one present in North India and another in South India. So these respective clusters would primarily be responsible for low latency in their regions, while the backup gets replicated to each other in this kind of setup.

Active/Passive or Active/Standby high availability deployments - A->B

In an active/passive setup, there are two clusters present - primary & backup. The primary one is responsible for serving consumers in production, while backup acts as an idle replicator, which sometimes gets used for performing analytics for business insights.

Aggregation (e.g., from many clusters to one): A->K, B->K, C->K

In aggregation mode, we have multiple Kafka source clusters from where the data gets replicated to a single target cluster. This setup is preferred because the source clusters are highly available, and the backup data is less sensitive to not having a dedicated one for each source present.

Fan-out (e.g., from one to many clusters): K->A, K->B, K->C

Fan-out mode is applied when the data from the source Kafka clusters need to be highly available all the time, and we cannot afford a failure of even multiple data centers at a time. Hence, the source Kafka cluster is replicated to at least more than one target cluster.

Forwarding: A->B, B->C, C->D

In a forwarding type of setup, multiple clusters are replicated sequentially by the MirrorMaker framework. This setup comes in handy when the data is particularly fragile and requires very high fault tolerance.

Best Practices for Low Latency

To minimize Kafka producer lag, keep the source Kafka cluster & the MirrorMaker processes as close as possible to the target Kafka clusters. It is because Kafka producers get more impacted when high network latency connections are present.

primary --------- MirrorMaker --> secondary

(remote) (local)

The best practice is to consume from remote and produce to local setup. It means we should have the MirrorMaker services close to the target clusters and specify the --clusters in the MirrorMaker connect configuration.

$ ./bin/connect-mirror-maker.sh connect-mirror-maker.properties --clusters secondary

The clusters specified under the --clusters tell the connect framework that these are nearby, and thus, the data is sent only to them rather than sending it to remote ones.

How To Ensure Open Monitoring of Kafka Clusters?

Apache Kafka uses java libraries to capture its real-time metrics. A JMX exporter running as the java agent during the run-time of the server can help expose the metrics in an HTTPS port.
Prometheus is an open-source system for real-time monitoring and alerting. It uses a real-time HTTP-based pull model to extract data and help us write custom expressions/queries for alerting.
With the help of Prometheus, we can scrape data from the exposed metric port and run queries on top of the real-time data for monitoring and analysis.
Grafana is an open-source visualization tool that interacts with data sources like Prometheus and displays panels, graphs, and dashboards to monitor and visualize the required metrics.

This blog gave a brief overview of the MirrorMaker 2 replicator framework along with monitoring Kafka clusters in a multi-cluster environment.

Apache Kafka is a sturdy part of any IT stack & helps organizations across the globe manage their data efficiently and eliminate riddled downtime & failures.

Where Can You Learn More?

If you want to learn more about replication in Kafka with MirrorMaker 2, you should register for our upcoming virtual session on Kafka on June 30, 2021. You can expect a demo of its use cases, architecture, and implementation as well. Register Now!

View full post