Skip to content

Rewrite CountPaths to produce sharded Parquet files directly

vlorentz requested to merge countnodes-parquet into master

Instead of streaming a single CSV, which needed to be sharded and turned into ORC while uploading to S3 in Python, which is inefficient and brittle.

This also adds Bloom Filters to allow efficient node selection

Merge request reports