Draft: Rewrite the Provenance service as a gRPC server in Rust, backed by Parquet (featuring Datafusion joins)
This is !182 (merged) plus one commit that is supposed to add initial support for batch queries
Unfortunately, Datafusion does not seem to use pushdown when joining, which makes
low-cardinality join very inefficient. Here, joining c_in_r with a one-row table
takes 1.5TB for minutes (hours?) on the prod table, while a 'WHERE cnt=' clause
was under 0.2s."
Merge request reports
Activity
Jenkins job DPROV/gitlab-builds #79 failed in 1 min 31 sec.
See Console Output, Blue Ocean and Coverage Report for more details.Jenkins job DPROV/gitlab-builds #80 failed in 1 min 46 sec.
See Console Output, Blue Ocean and Coverage Report for more details.mentioned in issue swh/infra/sysadm-environment#5406
added 1 commit
- 77581225 - Failed attempt to rewrite HashJoin to NestedLoopJoin
Jenkins job DPROV/gitlab-builds #81 failed in 2 min 48 sec.
See Console Output, Blue Ocean and Coverage Report for more details.Jenkins job DPROV/gitlab-builds #82 failed in 2 min 46 sec.
See Console Output, Blue Ocean and Coverage Report for more details.Jenkins job DPROV/gitlab-builds #83 failed in 1 min 48 sec.
See Console Output, Blue Ocean and Coverage Report for more details.mentioned in merge request !184 (merged)
mentioned in merge request !185 (closed)
added 1 commit
- 57e896ae - Make the HashJoinExec -> NestedLoopJoinExec optimizer work
Jenkins job DPROV/gitlab-builds #88 failed in 3 min 1 sec.
See Console Output, Blue Ocean and Coverage Report for more details.Jenkins job DPROV/gitlab-builds #89 failed in 1 min 16 sec.
See Console Output, Blue Ocean and Coverage Report for more details.Replaced by !184 (merged)