Apache kafka, a new keyword which is complex to understand but let’s make easy for you to learn and implement. Kafka is nothing less like a messaging queue,which means a data pipeline(which takes input of data and returns data,like pipe does).Kafka is an open source platform developed by Apache Software Foundation written in java and scala as well.
The key points for Kafka are as follows :
Allows you to publish and consume data streams
– Publish Data Streams(Publisher)
Before discussing publisher, let’s discuss what topic is. Topic, a broad term, which means categories of feeds or data streams are stored into it.One topic can have multiple consumers who can subscribe data.Topics are further divided into partitions,which helps them to execute tasks in parallel fashion.
Publishers allows you to publish data streams in topic/topics of your choice.Publisher is responsible for choosing topic and partition of topic in which user wants to publish data and can manage the data.The concept of partitions are used to make our work easier and for load balancing as well.
– Subscribe Data Streams(Consumer)
Consumers will subscribe data from topics.Consumers have their consumer groups,data stored in topic is consumed by one consumer instance subscribing consumer group.If the number of consumers are greater than partition number,then some of the consumers will be idle.
– Streams Data
This allows processing of data. Processing of data involves filtering,transformation or aggregation of data.Example: Filtering of tweets on the basis of particular keyword.
Let’s move towards the installation steps of kafka :