Skip to content

Add PopularContentPaths

vlorentz requested to merge vlorentz/swh-graph:popularcontentpaths into master

This is similar to PopularContentNames, but it can recurse more than one step in the directory hierarchy.

I originally tried to extended PopularContentNames, but merely generalizing the code caused the existing case (maxDepth=1) to take over 3 days instead of ~20 hours; so it makes sense to keep the existing implementation of PopularContentNames.

The significant change compared to PopularContentNames (besides the extra maxDepth parameter) is that SWHIDs are taken from stdin instead of computing popular paths for every content in the graph, because the latter requires unreasonable resources (ETA: over 20 days, while maxing out 96 CPUs).

Additionally, adding the maxDepth parameter in combination with the other ones (max_results_per_cnt, popularity_threshold) started making the code a little unwieldy; and they are probably not useful here, so they are excluded.

Merge request reports