Skip to content

Add a command to generate a subdataset from a list of SWHIDs using S3

This is a production-ready version of scripts I've had for years to generate subdatasts of the SWH graph dataset from a list of swhids to include. It uploads the list to S3, then JOINs the list with the main dataset using Amazon Athena.

Test Plan

Tested manually, very hard to unit test as it's super AWS specific.


Migrated from D7010 (view on Phabricator)

Merge request reports