Skip to content

luigi: Dynamically list directories instead of using object_types

Before this commit, UploadExportToS3 and DownloadExportFromS3 assumed the set of object types was the same as the set of directories, which is wrong:

  • for the edges format, there is no origin_visit or origin_visit_status directory
  • for both edges and orc formats, this was missing relational tables.

A possible fix would have been to use the swh.dataset.relational.TABLES constant and keep ignoring non-existing dirs in the edges, but I decided to simply list directories instead, as it will prevent future issues if we decide to add directories that do not match any table in Athena for whatever reason.


Migrated from D8965 (view on Phabricator)

Merge request reports

Loading