Split CSV files of large derived datasets
eg. the PopularContents
task produces a single .csv.zst
file close to 500GB. CSV parses then easily become a bottleneck, and they are not parallelization by splitting on line batches, because newline characters may be part of a line (escaped with quotes)