Kafka之Internal

kafka之Internal

Controller

The controller is one of the Kafka brokers that, in addition to the usual broker functionality, is responsible for electing partition leaders .

When the controller notices that a broker left the cluster (by watching the relevant Zookeeper path), it knows that all the partitions that had a leader on that broker will need a new leader.

Replication

Data in kafka is orgnized by topics, each topic is partitioned and each parttion can have multiple replcas.

Two types of replica:

Leader Replica

​ each partition has a single replica designated as the leader. All produce and consume request go through the leader.

Follower Replica

​ any replicas that are not leaders are called followers, only job is replicate msg from leader and stay up-to-date .

ISR : in-sync-replica eligible to be elected as partition leader. (Replica.lag.time.max.ms)

request processing

If broker recieves a produce or fetch for specific partition and leader for this partition is on a different broker, response - "Not a leader for partition"

How do the clients know where to send the requests?

metadata request

  • all broker has metatadata cached
  • metadata.max.age.ms as cluster could change, if client receives "Not leader" error, refresh metadata

produce msg

ask - controls how kafka determins a write as "written success"

Notes: the messages are written to the filesystem cache and there is no guarantee about when they will be written to disk. Kafka does not wait for the data to get persisted to disk—it relies on replication for message durability.

Consume msg

(Ack = all) , consumer wait until all the in-sync replicas get the message and only then allow consumers to read it , if replication between brokers is slow for some reason, it will take longer for new messages to arrive to consumers.

replica.lag.time.max.ms—the amount of time a replica can be delayed in replicating new messages while still being considered in-sync.

Retention

Kafka does not keep data forever, nor does it wait for all consumers to read a message before deleting it.

segment

Because finding the messages that need purging in a large file and then deleting a portion of the file is both time-consuming and error-prone, we instead split each partition into segments. By default, each segment contains either 1 GB of data or a week of data, whichever is smaller. As a Kafka broker is writing to a partition, if the segment limit is reached, we close the file and start a new one.