Skip to content

Test removal and restore in Kafka

Removing an object from Kafka requires writing a new message with the same key as previously used and an empty value. These tombstones then get later “compacted” depending on the topic settings cleanup.policy, max.compaction.lag.ms and delete.retention.ms.

To test the presence or absence of objects in Kafka, we thus need to find which is the most recent: a tombstone or a value. In order to do so, we parse all messages into a single dict, associating SWHIDs with the latest message timestamp and if it should be considered present or absent. This is sadly a bit time and memory consuming but at least we get accurate results.

While not strictly necessary, we now use a topic configuration in Kafka that will aggressively try to remove “dead” messages. It should improve slightly the time needed to inventory objects as previously described.

We use the match syntax introduced in Python 3.10 in handle_message(), so we bump black compatibility settings to Python 3.11.

Based on !292 (merged)

Depends on swh-alter!7 (merged) (and a new release thereafter)

Edited by Jérémy Bobbio (Lunar)

Merge request reports