What is Kafka?
A brief Introduction
Kafka is a stream processing software used for processing streams of data to provide real-time data analytics. Kafka can serve as a
- Message broker: a common platform to send and receive messages, though it lot more powerful to just act as a message broker.
- Message storage system: apart from being a message broker, Kafka stores streams of data(i.e. messages) in topics that can serve as logs and can be replayed at any point in time for any kind of analysis or if it just needs to be consumed by a new consumer.
- Message ordering: Kafka processes messages in the order they are received thus maintaining the sequence in which the data is processed.
- Delivery guarantee: Kafka can maintain multiples replicas of a topic to avoid data loss, which in turn makes sure that data persisted in Kafka can eventually be consumed.
- Topic: A Kafka topic is a category/feed name to which records are stored and published.
- Partition: Partitions are the main concurrency mechanism in Kafka. A topic is divided into 1 or more partitions, enabling producer and consumer loads to be scaled.
- Producer: It is a client or a program, which produces the message and pushes it to a Topic.
- Consumer: It is a client or a program, which consumes the published messages from the Producer.
- Consumer Group: A consumer group is a group of related consumers that perform a specific task. The consumers in a group then divide the topic partitions amongst themselves by establishing that each partition is only consumed by a single consumer from the group.
- Offset: An offset is an integer number that is used by Kafka to maintain the current position of a consumer. The Offset can be changed as per the need to replay messages.
- Broker: A Kafka broker is a Kafka Server that hosts topics. A Kafka broker receives messages from producers and stores them on disk keyed by unique offset. A Kafka broker allows consumers to fetch messages by the topic, partition, and offset.
- Kafka Cluster: A Kafka cluster consists of one or more servers (Kafka brokers) running Kafka. Kafka brokers can create a Kafka cluster by sharing information with each other directly or indirectly using Zookeeper
- ZooKeeper: Kafka uses ZooKeeper to manage the cluster. ZooKeeper keeps track of the status of the Kafka cluster nodes, Kafka topics, and partitions. ZooKeeper elects leader for Broker Topic Partition.
Kafka is installed and running on the default port 9092.
To see the Installation steps for Kafka from Confluent, click here.
Working with .NET Clients
Creating a Topic
A Kafka topic is a container to messages which can be exchanged between Producer(s) and Consumer(s).
To start with we’ll require the server IP where the Kafka service is running and the port is it on. By default, Kafka uses port 9092, but it can be changed.
Confluent.Kafka (by Confluent Inc.) is one of the NuGet Packages that can be used to connect with Kafka among others.
AdminClientConfig can be used to create the config for an admin client which can talk to Kafka.
CreateTopicsAsync() in adminClient accepts a list of TopicSpecification to create topic(s). Each item in the list contains TopicName, the ReplicationFactor & NoOfPartitions for the topic.
Publishing message to Kafka
Messages to Kafka can be published in a very similar way to creating a topic, by creating a ProducerConfig and building the producer object.
ProducerConfig accepts many fields in Config, but only the server where Kafka is hosted is an actual requirement. Values for other fields are taken by default.
Discussing a certain example is Acks, which gets the acknowledgment from Kafka when a message is successfully published to a topic. Now we may have many Kafka nodes in a cluster and based on the replication factor provided while creating a topic, acknowledgment can be from just a single Kafka node (the leader), maybe all the nodes or none of them at all. This depends on the type of data actually being published and this has a trade-off with response time. When Acks is set to all, we have better reliability that data has actually persisted but an increase in the actual response time from when Acks was set to none or just one. This is useful when we have some critical data, say some financial transaction or audit-related data.
Building a producer from the config requires a need to understand a message.
Kafka Message contains Key, Value, Timestamp, and a Header.
- Key: Key is a small value typically int or string associated with a Kafka message. Messages with the same Key are sent to the same Topic partition, this guarantees inorder processing on message by consumers, even if one or more consumer(s) go down.
- Value: Value is the actual message that needs to be sent from the Producer to the Consumer and can be of any data type. Typically it is a string, JSON, Avro, Protobuf, or just binary data.
- Timestamp: Timestamp is the field that shows when the message was persisted in a Kafka topic.
- Headers: Kafka Headers are a small segment of data and are very useful when the actual data being transmitted is very large. Headers can contain fields that avoid the need to check the entire message unless actually needed. This is helpful when the message was serialized by the producer to compress the data, and it only needs to deserialized by the consumer if there is an actual requirement. Otherwise, it can be avoided based on Headers, thus making the entire architecture very efficient.
The actual code to published data to a Kafka topic is below. ProduceAsync() from the producer object built earlier pushes the message to the Kafka topic. This then returns the persistence status of the message.
Consuming message to Kafka
To consume the message from a Kafka topic, a consumer needs to build, which accepts ConsumerConfig as an argument.
ConsumerConfig requires ConsumerGroupId and Kafka Server address, other fields have a default value and can be changed based on the requirement.
Consumer built needs to then subscribe to a topic, uses the consume() to consume message(s) from a topic. The consumer needs to be a service running continuously and consuming messages as and when they arrive in the topic subscribed to.
For the Github link to basic Kafka Producer and Consumer application, click here.