Add random directory sampling policy
This makes use of the new discovery algorithm introduced in ``swh-loader-core``, which should help speed up large (think Linux kernel or way larger) scans. Most of the time is spend walking the on-disk directory and hashing, which is where the new optimizations in ``swh-model==6.5.0`` should come in handy. Python is close to its limit in that regard, some future endeavor should look into setting up SWH for native extensions.
Showing
- requirements-swh.txt 1 addition, 0 deletionsrequirements-swh.txt
- swh/scanner/cli.py 5 additions, 1 deletionswh/scanner/cli.py
- swh/scanner/policy.py 78 additions, 7 deletionsswh/scanner/policy.py
- swh/scanner/scanner.py 3 additions, 0 deletionsswh/scanner/scanner.py
- swh/scanner/tests/conftest.py 11 additions, 2 deletionsswh/scanner/tests/conftest.py
- swh/scanner/tests/flask_api.py 4 additions, 1 deletionswh/scanner/tests/flask_api.py
- swh/scanner/tests/test_policy.py 65 additions, 0 deletionsswh/scanner/tests/test_policy.py
Loading
Please register or sign in to comment