SelectBlobs: Replace Athena with pyspark.sql
The growing dataset size means we now hit Athena's 30 min timeout. Using pyspark takes longer (3h) but spares the slow download from S3 and does not have a timeout.
This adds a dependency on pyspark, which is a 300MB package full of JARs; but we might use it for other workloads in the future.
Merge request reports
Activity
changed milestone to %Publish derived datasets [Roadmap - Share]
added 2 commits
Jenkins job DGRPH/gitlab-builds #839 failed .
See Console Output and Coverage Report for more details.Jenkins job DGRPH/gitlab-builds #838 failed .
See Console Output and Coverage Report for more details.added 3 commits
-
b711b5f2...c46f101b - 2 commits from branch
master
- 42ae2256 - SelectBlobs: Replace Athena with pyspark.sql
-
b711b5f2...c46f101b - 2 commits from branch
Jenkins job DGRPH/gitlab-builds #846 succeeded .
See Console Output and Coverage Report for more details.Jenkins job DGRPH/gitlab-builds #856 succeeded .
See Console Output and Coverage Report for more details.added 7 commits
- f2db7e67 - Add build-labels-eliasfano command
- f983dba5 - Add support for labels in gRPC server
- 9808a0b3 - find_path: Use the right label direction when traversing backward
- 95ddd39b - Deduplicate code between Forward/Backward branches
- 9ca0412d - Add tests for labels in FindPathTo and FindPathBetween
- b5b460d7 - Make property structures public
- 3a41865b - Remove duplicate headers in the middle of the file
Toggle commit listJenkins job DGRPH/gitlab-builds #866 succeeded .
See Console Output and Coverage Report for more details.added 17 commits
-
3a41865b...10e16fc7 - 12 commits from branch
master
- a5fbf463 - SelectBlobs: Replace Athena with pyspark.sql
- 95951742 - Fix reformatting
- f517dacc - Give up on compression for temporary files\n\nIt causes spark to produce empty files sometimes
- 68f7473a - Add test for SelectBlobs
- 67220251 - Remove duplicate headers in the middle of the file
Toggle commit list-
3a41865b...10e16fc7 - 12 commits from branch
Jenkins job DGRPH/gitlab-builds #882 failed .
See Console Output and Coverage Report for more details.added 21 commits
-
67220251...1e716510 - 16 commits from branch
master
- ec28a090 - SelectBlobs: Replace Athena with pyspark.sql
- 1e2d9554 - Fix reformatting
- 6f97dc1f - Give up on compression for temporary files\n\nIt causes spark to produce empty files sometimes
- 839e655d - Add test for SelectBlobs
- 26adb9a7 - Remove duplicate headers in the middle of the file
Toggle commit list-
67220251...1e716510 - 16 commits from branch
Jenkins job DGRPH/gitlab-builds #927 failed .
See Console Output and Coverage Report for more details.Jenkins job DGRPH/gitlab-builds #928 succeeded .
See Console Output and Coverage Report for more details.