Apache Kafka notes
Until now I used Kafka only in combination with Logstash and with a simple Pyhton consumer. Here is what I learnt from diving a bit deeper into the documentation:
Docker
Run Kafka on a single node with the following Docker commands:
sudo docker run -d --net=host --name zookeeper \
-e ZOOKEEPER_CLIENT_PORT=2181 \
confluentinc/cp-zookeeper
sudo docker run -d --net=host --name kafka \
-e KAFKA_ZOOKEEPER_CONNECT=localhost:2181 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \
confluentinc/cp-kafkaTo enable remote connections to the Kafka node, replace localhost in the KAFKA_ADVERTISED_LISTENERS environment variable with the IP address.
key/value pair
Besides a value, each record in Kafka can also have an optional key. With the Kafka Streams API it is possible to filter on key.
Kafka Streams
In the Kafka Streams API you can find KStream and KTable, which are dual concepts as explained in the following articles:
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ http://docs.confluent.io/3.1.2/streams/index.html
It is interesting to note that there is no direct way to transform a KStream into a KTable. Instead the records of a KStream can be grouped into a KGroupedStream. KGroupedStream has 3 methods that transform it into a KTable: reduce, aggregate and count.
In the same way the records of a KTable can be grouped into a KGroupedTable.
On the other hand KTable has a toStream method that transforms it directly into a KStream.