Support large messages in swh.journal / kafka
Some objects in the Software Heritage archive are large. As it turns out, even larger than the default max message size in kafka.
The max message size in kafka is set pretty small (1MB by default) as increasing it increases the memory requirements for brokers, if lots of large messages are processed.
We've come up with two choices:
- increase message size in kafka
- split large messages in multiple parts (https://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297).
Consdering the low amount of large objects, increasing the max message size is a quick fix that shouldn't have many downsides, as long as we monitor the impact on kafka brokers.
Splitting large messages has several implications, notably with respect to partitioning and compaction, which means that any solution should be carefully considered.
I'll check the size of some currently "stupidly large" objects, and will bump the max message size accordingly in kafka. A quick test shows that bumping the max message size to 100 MB lets through a synthetic directory with a million entries.
Migrated from T2350 (view on Phabricator)