Export new graph data from GitHub lag ingestion back into SWH main archive

-outdated- (can't manage to do a strike-through 😅) The deduplicator exports graph data in ORC format based on messages written by loaders into Kafka. It needs to create two datasets:

one that will remain on Adastra in the form of ORC files (#47 (closed))
one that will be back-exported to the SWH main archive

This issue is concerned with the latter. We don't know yet how to this, the exchange format still needs to be specified ("raw data loader"). The API on the SWH side is the storage, use it to ingest objects in the reverse order of the graph (not clear yet). We will create an issue to define the exchange format (done with #56 (closed)). -outdated-

The deduplicator exports graph data in ORC format based on messages written by loaders into Kafka. It originally was meant to not include the actual object data (source code), but as discussed in #63, source code will be included in the graph data set. This will thus likely involve reaching out to the Adastra Winery storage to fetch the source code and augment the purely structural graph data with it.

Edited Jul 28, 2025 by Simeon Carstens