-
Jérémy Bobbio (Lunar) authored
Removing an object from Kafka requires writing a new message with the same key as previously used and an empty value. These tombstones then get later “compacted” depending on the topic settings `cleanup.policy`, `max.compaction.lag.ms` and `delete.retention.ms`. To test the presence or absence of objects in Kafka, we thus need to find which is the most recent: a tombstone or a value. In order to do so, we parse all messages into a single dict, associating SWHIDs with the latest message timestamp and if it should be considered present or absent. This is sadly a bit time and memory consuming but at least we get accurate results. While not strictly necessary, we now use a topic configuration in Kafka that will aggressively try to remove “dead” messages. It should improve slightly the time needed to inventory objects as previously described. We use the match syntax introduced in Python 3.10 in `handle_message()`, so we bump black compatibility settings to Python 3.11. Depends on swh/devel/swh-alter!7 (and a new release thereafter)
Jérémy Bobbio (Lunar) authoredRemoving an object from Kafka requires writing a new message with the same key as previously used and an empty value. These tombstones then get later “compacted” depending on the topic settings `cleanup.policy`, `max.compaction.lag.ms` and `delete.retention.ms`. To test the presence or absence of objects in Kafka, we thus need to find which is the most recent: a tombstone or a value. In order to do so, we parse all messages into a single dict, associating SWHIDs with the latest message timestamp and if it should be considered present or absent. This is sadly a bit time and memory consuming but at least we get accurate results. While not strictly necessary, we now use a topic configuration in Kafka that will aggressively try to remove “dead” messages. It should improve slightly the time needed to inventory objects as previously described. We use the match syntax introduced in Python 3.10 in `handle_message()`, so we bump black compatibility settings to Python 3.11. Depends on swh/devel/swh-alter!7 (and a new release thereafter)