Skip to content

Add support for object deletion to KafkaJournalWriter

“Deleting” an event in Kafka is a two-step process. First, a new event is added for the key to be deleted with null as its value. Such events are known as tombstones. When topics are configured to use compaction, older events will actually be deleted after specific thresholds have been reached.

Tombstones themselves usually also linger for a while in a topic. This gives a chance for consumers to learn that a given key has been deleted. This is configured by delete.retention.ms. For Software Heritage, we should still not rely on consumers of the journal actually seeing these tombstones to handle object deletions. If they lag too much, the tombstone will eventually be removed (together with the actual data) from the journal. This shall be handled by #4658 instead.

Normally, compaction will be triggered when the ratio of dirty data to total data reaches the threshold set by the min.cleanable.dirty.ratio configuration. min.compaction.lag.ms can be set to prevent overly aggressive cleaning. This provides a minimum period of time for applications to see an event prior to its deletion. max.compaction.lag.ms sets the time limit before triggering a compaction, regardless of the amount of dirty data.

For more information see: https://developer.confluent.io/courses/architecture/compaction/

The delete method is only implemented for KafkaJournalWriter because the semantics are so closely aligned with Kafka’s.

Based on the initial merge request !233 (closed) written by olasd.

Closes: #4657 (closed)

Edited by Jérémy Bobbio (Lunar)

Merge request reports