whereare : n contents -> n qualified swhids, one or none per each content
scanner
from a directory on local
recursively found the existing objects in the archive (content, directory) via the api/1/known api endpoint
lot of small queries
use the remaining quota to rate-limit the queries
If the scanner is started with the --provenance option
Send a batch of 1000 swhids per requests to the provenance api
Launch several requests in parallel
5 initially -> timeouts
3 now works better but still timeout from time to time
Several action were identified, some out of scope of this issue centered on the production environment:
[out-of-scope] Add metrics on swh-graph
[out-of-scope] Add logs on the scanner to identify problematic batchs
[out-of-scope] Add debug logs on swh-provenance to trace the durations of the queries to swh-graph with the swhid / global duration (swh/devel/swh-provenance!179 (merged))
Increase the swh-provenance gunicorn worker timeouts to 30s (timeout is already 60s)
To avoid any side effect related to the increase of timeout, limit the maximum number of parallel queries on the /api/1/provenance endpoint to 10.
One point to diagnose is the duration of the swh-provenance requests, it seems the swh-graph queries take globally around 50 pico-seconds, with a couple taking several milli seconds.
It's not clear which ones yet.
There is also a couple of issues in sentry that need to be investigate
sept. 11 06:55:38 granet swh[486308]: 2024-09-11T06:55:38+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 95.779µssept. 11 06:55:38 granet swh[486308]: 2024-09-11T06:55:38+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.05µssept. 11 06:55:38 granet swh[486308]: 2024-09-11T06:55:38+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 1.350666mssept. 11 06:55:38 granet swh[486308]: 2024-09-11T06:55:38+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 44.162µssept. 11 06:55:38 granet swh[486308]: 2024-09-11T06:55:38+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 54.074µssept. 11 06:55:38 granet swh[486308]: 2024-09-11T06:55:38+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 55.117µssept. 11 06:55:56 granet swh[486308]: 2024-09-11T06:55:56+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 70.772µs <--- what is this request ? one from 06:56:17 or totally something elsesept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 69.679µs <--- 06:56:17 requestsept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 41.025µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 43.442µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.317µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 48.039µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 55.162µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 41.629µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 50.291µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 49.707µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 54.119µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 39.906µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 48.816µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 38.417µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.253µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 38.351µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 49.869µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 30.653µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 37.094µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 47.318µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.281µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 32.494µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 36.825µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 51.982µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 30.365µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 36.906µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 50.541µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 36.968µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 48.711µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 36.707µssept. 11 06:56:14 granet swh[486308]: 2024-09-11T06:56:14+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 50.797µssept. 11 06:56:16 granet swh[486308]: 2024-09-11T06:56:16+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 54.554µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 69.792µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 43.65µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.607µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 42.152µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 38.823µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 63.752µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 51.561µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 50.331µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 40.416µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 48.913µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 62.376µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.493µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.628µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 41.694µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 51.311µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 32.313µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 65.395µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.054µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.097µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 38.606µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.876µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 38.692µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.035µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 38.463µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.576µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 44.907µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 54.351µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 42.356µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 42.727µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 50.661µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 67.917µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.209µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 65.299µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 53.979µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 51.91µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 51.503µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 55.703µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 42.812µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.836µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 39.053µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 51.824µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 38.582µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 55.159µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 39.499µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.343µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 32.427µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 45.426µssept. 11 06:56:17 granet swh[486308]: 2024-09-11T06:56:17+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 35.781µs <--- end of 06:56:17sept. 11 06:56:36 granet swh[486308]: 2024-09-11T06:56:36+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 62.487µs <--- Same pattern, one request and nothing before 30ssept. 11 06:56:54 granet swh[486308]: 2024-09-11T06:56:54+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 106.881µssept. 11 06:56:54 granet swh[486308]: 2024-09-11T06:56:54+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 44.666µssept. 11 06:56:54 granet swh[486308]: 2024-09-11T06:56:54+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 65.576µssept. 11 06:56:54 granet swh[486308]: 2024-09-11T06:56:54+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 50.926µssept. 11 06:56:54 granet swh[486308]: 2024-09-11T06:56:54+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 52.058µssept. 11 06:56:54 granet swh[486308]: 2024-09-11T06:56:54+00:00 - INFO - 200 OK - http://graph.internal.softwareheritage.org:50091/swh.graph.TraversalService/Traverse - 50.571µs
If the swh-graph logs can be trusted, what is seen is there is ~30s between each return of provenance request. These logs are no enough to diagnose where they are spend.
Is it a delay between 2 scanner request? Is it a delay in the webapp ? Is it a delay in swh-provenance ? or something else in the infra ?
The debug logs in provenance will help to diagnose that.
One more information: there is 2 provenance instances running with each one 4 gunicorn workers configured with 1 thread. It theoretically allows to handle 8 parallel requests but in the reality
there is no guaranty a new request will not be sent to a busy instance and will be put in the waiting queue.
Testing using a local instance of swh-provenance (running with swh provenance rpc-serve) using granet's grpc as backend, executing 1 whereare query with 353 swhids, I have the following response time histogram for traversal query times (these are in ms):
Now, with the idea of filtering these edge case swhids (swh/devel/swh-provenance!177 (merged)), here are the overall whereis response time for each of the (with max_edges=100000) in second:
Vincent Selliermarked the checklist item [out-of-scope] Add debug logs on swh-provenance to trace the durations of the queries to swh-graph with the swhid / global duration (swh/devel/swh-provenance!179 (merged)) as completed
marked the checklist item [out-of-scope] Add debug logs on swh-provenance to trace the durations of the queries to swh-graph with the swhid / global duration (swh/devel/swh-provenance!179 (merged)) as completed