Skip to content

Automate subdataset creation

vlorentz requested to merge subdataset into master

Currently running on Maxxi with this command:

TMPDIR=/srv/softwareheritage/tmp/ AWS_PROFILE=prod swh graph luigi --base-directory /srv/softwareheritage/ssd/data/vlorentz/datasets/ --dataset-name 2024-08-23_popular-10-lua --parent-dataset-name 2024-08-23 --grpc-api localhost:50091 --s3-prefix s3://softwareheritage/graph/ --s3-athena-output-location s3://softwareheritage/tmp/athena/ --athena-prefix swh CreateSubdatasetOnAthena -- --scheduler-url http://localhost:50092/ --SelectTopGithubOrigins-num-origins 10 --SelectTopGithubOrigins-query "language:lua"

cc @ardumont

Edited by vlorentz

Merge request reports

Loading