Rewrite of the export pipeline using Exporters (!9) · Merge requests · Platform / Development / swh-dataset

This rewrite enables multiple things:

Each export can now have multiple exporters, so we can read the journal a single time, then export the objects we read in different formats without having to re-read them every time.
We use a shared on-disk set for the nodes, to avoid storing them unnecessarily in each exporter
The SQLite files are sharded depending on the partition ID of the incoming messages. This reduces performance issues we had when using a single large set per process. It's also now easier to rewrite the on-disk set logic to use a different set backend, or to change the sharding.
The new abstractions make it a lot nicer to write exporters. You just need to override the methods corresponding to each object type, and you can do your setup and teardown in the enter and exit methods of your exporter, which is used as a context manager. Exporters also don't have to worry about duplicates, since this is already done in the journal processor itself.

Migrated from D4718 (view on Phabricator)

Rewrite of the export pipeline using Exporters