Asynchronous Communication between Microservices with Apache Kafka
Introduction:
Asynchronous communication between microservices is a crucial aspect of building scalable, resilient, and loosely coupled distributed systems. Apache Kafka is a popular open-source distributed event streaming platform that provides a robust foundation for implementing asynchronous communication in microservices architectures. Asynchronous communication refers to a messaging pattern where microservices exchange information without waiting for an immediate response. This decouples the sender and receiver, allowing them to operate independently and improving system reliability.
Understanding Microservices Communication:
Microservices communication with each other through either synchronous or asynchronous mechanisms. Synchronous communication involves direct API calls, which lead to tight coupling and potential issues in a distributed enviroment.Asynchronous communication, on the other hand, decouples microservices by using an intermediary to pass messages between them.
- Message Queuing:
- The main task of managing system is to transfer data from one applications to another so that the application can mainly work on data without worrying about sharing it.
- Distributed messaging is based on the reliable massage queuing process. Messages are queued non-synchronously between the messaging system and client application.
- There are two types of messaging patterns available.
- Point to point messaging system: In this message remains in queue and more than on consumer can consume the message in queue but only one consumer can consume a particular message. After the consummer reads the message in queue, the message will disappears from queue
- Publish-Subscribe messaging system: The message continue to remain in a topic. Contrary to Point to point messaging system, consumer can take more than one topic and consume every message in topic.
Why Apache Kafka?
Apache Kafka is a powerful distributed event streaming platform designed for handling real-time data feeds at scale. Its popularity stems from its ability to provide high-throughput, fault-tolerant, and scalable event streaming capabilities. Kafka acts as a central hub for streaming data, allowing producers to publish messages to topics and consumers to subscribe and process these messages asynchronously. Its durability and fault-tolerance features, such as data replication and persistence, make it reliable in the face of failures.
- Key Concepts of Apache Kafka:
- Zookeepers: Zookeeper is required for running Kafka; it has responsibility to manage topic configuration, the topics access control list, cluster membership and the coordinate Kafka cluster.
- Topics: Topics is the place where publishers push the messages and subscribers pull the messages.
- Producers: Microservices that generate and send message to Kafka are known as Producers. Producers are responsible for selecting the topic and sending messages to Kafka.
- Consumers: Microservices that receive and process messages from kafka are known as Consumers. Consumers subscribe to specific topics and can process message in real-time.
- Brokers: Kafka uses a distributed architecture with brokers that manage the storage and retrieval of messages. Brokers ensure fault-tolerance and high availability.
Benefits of Apache Kafka:
- Independence: Services can operate independently of each other. In a synchronous communication model, services need to wait for a response before proceeding with their tasks. This can lead to bottlenecks and delays, especially when dealing with high volumes of requests. With asynchronous communication, services can send messages to each other without waiting for a response, allowing them to continue processing other tasks in the meantime.
- Fault Tolerance and Resilience: In a synchronous model, if a service fails or becomes unresponsive, it can bring down the entire system. With asynchronous communication, services can continue to operate even if some of them are experiencing issues. Messages can be stored in a message broker like Apache Kafka, and processed once the service is back online. This ensures that the system remains functional even in the face of failures.
- Scalability: In a synchronous model, scaling services can be challenging as they need to handle the load in real-time. With asynchronous communication, services can scale independently of each other. Apache Kafka acts as a buffer, allowing services to process messages at their own pace. This makes it easier to add or remove instances of services based on demand, without affecting the overall system performance.
Implementing Apache Kafka:
Here's a basic overview of how you might implement asynchronous communication using Apache Kafka:
- Define Topics: In Kafka, data is stored in topics. Each topic consists of a series of records (also known as messages). Producers write data to topics and consumers read from them. For instance, you could have different topics for different types of events or data streams in your application.
- Producer Creation: A producer is responsible for sending messages to Kafka topics. Here's an example of how you might create a producer in Java:
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer<>(props);
- We first define some properties such as the server address and serializers for the keys and values. Then, we instantiate a KafkaProducer object with these properties.
- Message Production: Once you have a producer, you can use it to send messages to Kafka topics. Here's an example:
ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key", "value"); producer.send(record);
- We create a
ProducerRecord
object specifying the topic name, key, and value of the message. We then call thesend
method of the producer to publish the message to the topic.
- Consumer Creation: A consumer reads data from Kafka topics. Similar to producers, consumers require certain properties to connect to Kafka servers. Here's an example of how you might create a consumer in Java:
Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); Consumer<String, String> consumer = new KafkaConsumer<>(props);
- We define properties such as the server address, group id for the consumer, auto commit settings, and deserializers for the keys and values. We then instantiate a KafkaConsumer object with these properties.
- Message Consumption: After creating a consumer, you can use it to read messages from Kafka topics. Here's an example:
consumer.subscribe(Arrays.asList("my-topic")); while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value()); } }
- In this code, we subscribe the consumer to the "my-topic" topic. We then enter a loop where we continuously poll for new records from Kafka. For each record, we print out the offset, key, and value.
To run Apache Kafka and Zookeeper on your local system, you need to follow these steps:
- Download Apache Kafka: You can download the latest version of Apache Kafka from the official link. The downloaded content will be compressed in
.tgz
format. Extract the content using the following command:
tar -xzvf kafka_2.13-3.5.0.tgz
- After the command completes, navigate to the newly created directory:
cd kafka_2.13-3.5.0
- Start Zookeeper: Before starting Kafka, you need to start Zookeeper because Kafka uses Zookeeper for service discovery and coordination. Open a new terminal and navigate to the Kafka directory. Then, start Zookeeper using the following command:
bin/zookeeper-server-start.sh config/zookeeper.properties
- Keep this terminal window open until you have started Kafka.
- Start Kafka Server: In a new terminal window, navigate to the Kafka directory again. Start the Kafka server using the following command:
bin/kafka-server-start.sh config/server.properties
- Note: Please note that these commands assume you're using a Unix-like operating system (like Linux or macOS). If you're using Windows, replace the
bin/
prefixes withbin\windows\
and use.bat
instead of.sh
for the script extensions.
Additional Features to do with Apache Kafka:
Partitions in Kafka:
- A partition is a divicion of Kafka topic. Paartition play a crucial role in Kafka's functionality and scalability. Here's how:
- Parallelism: Partitions enable parallelism. Since each partition can be placaed on a separate machine, a topic can be handle an amount of data that exceeds a single server's capacity. This allows producers and consumers to read and write data to a topic concueerently, thus increase througput.
- Odering: Kafka guarantees that message within a single partition will be kept in exact order they were produced. However, if order is important across partiton, additional design considerations are needed.
- Replication: Partition of a topic can be replicated across multiple brokers based on the topic's replication factor. This increases data reliability and availability.
- Failover: In case of broker failure, the leadership of partitions owned by the broker will be automatically taken over by another broker, which has the replica of these partitions.
- Consumer Groups: Each partition can be consumed by one consumer within a consumer group at a time. If more than one consumer is needed to read data from a topic simultaneously, the topic needs to have more than one partition.
- Offset: Every message in a partition is assigned a unique(per partition) and sequential ID called Offset. Consumers use this offset to keep track of thier position in the partition.
Topic Repliction Factor in Kafka:
- When designing a Kafka system, it's important to include topic repliction in the algorithm. In the event of a broker failure, the crisis can be mitigated by relying on the topic replicas stored on other brokers, provided there is no partitioning occurring.
- Some important points are stated:
- The level of replication performed is partition level only.
- A given partition can only have one broker designated as the leader at any given time. Meanwhile, other broker will maintaining replicas.
- Having more than the number of brokers would result in an over-saturtion of the replication factor.
Rebalancing in Kafka:
- Rebanlacing refers to the process of redistributing the partition of topics across all consumer in a consumer group group. Rebalancing ensure that all consumers in the group have an equal number of partition to consume from, thus evenly distributing the load.
- Rebalancing can be triggered by several events:
- Addition or removal of a consumer: If a new consumer joins a consumer group, or an existing consumer leaves, a rebalance is triggered to redistribute the partition among the aviable consumers.
- Addition or removal of a topic's partition: If a topic that a consumer group is consuming fromm has a partition added or removed, a rebalance will be triggered to ensure that the consumers in the group are consuming from the correct partitions.
- Consumer finishes consuming all messages in its partitions: When a consumer hs consumed all messages in its current list of partition nd commits the offset back to kafka, a rebalance can be triggered to assign it new prtition to consume from.
Offset Management in Kafka:
- In Apache Kafka, consumer offset management-that is, tracking what messages have been consumed is handle by Kafka itself.
- When a consumer in a consumer group reads a message from a partition, it commits the offset of that message back to Kafka. This allows to keep trck of what has been consumed nd what messages should be delivered if a new consumer starts consuming, or an existing consumer restarts.
- Earlier version of kafka used Apache Zookeeper for offset tracking, but since version 0.9 Kafka uses an internl topic named "_consumer_offset" to manage these offset. This change has helped to improve scalability and durability of consumer offset.
- Kafka maintains two type of offset:
- Current Offset: The current offset is a reference to the recent tht Kafka has already provided to a consumer. As a result of the current offset, the consumer does not receive the same record twice.
- Committed Offset: The committed offset serves as a reference to the latest record that a consumer has successfully handled in a stream of messages. We work with the committed offset in case of any failure in application or replaying from certian in event stream.
Conclusion
In conclusion, the adoption of asynchronous inter-service communication proves instrumental in enhancing the resilience and performance of microservices architectures. Apache Kafka stands out as a top-choice messaging system for this purpose, offering a robust framework for real-time event streaming and reliable data transmission.
Apache Kafka, with its robust features, plays a pivotal role in facilitating asynchronous communication between microservices. By adopting this approach, you empower your microservices architecture with scalability, resilience, and the ability to handle complex interactions efficiently. Asynchronous communication using Kafka aligns well with the principles of modern microservices architectures, providing a foundation for building agile and responsive systems.
Comments
Post a Comment