Skip to content

Add random directory sampling policy

This makes use of the discovery algorithm introduced in swh-model, which should help speed up large (think Linux kernel or way larger) scans.

Most of the time is spent walking the on-disk directory and hashing, which is where the new optimizations in swh-model==6.5.0 should come in handy. Python is close to its limit in that regard, some future endeavor should look into setting up SWH for native extensions.

Merge request reports