Skip to content

Compute and show ETA for vault tasks

When requested object objects are present in swh-graph, we should be able to approximate the total runtime of vault tasks, as I expect it to be linear in the number of objects of each type.

How to do it:

  1. use data in swh-scheduler's database to get the run time of cooking each root object
  2. use swh-graph to compute the number of objects of each type (cnt + dir + rev should be enough) reachable from that root object
  3. run a linear regression to obtain a model of the runtime as a function of the number of object of each type
  4. store that model somewhere (vault backend? swh-web?)
  5. every time we get a cooking request, query counts in swh-graph (like in step 2) and use the model to estimate the run time

As a first approximation, we could skip steps 3 and 4, this might be good enough, as the git-bare cooker fetches objects somewhat homogeneously (ie. a batch of revs, then a batch of dirs, then a batch of contents, then revs again, ...)

This would be a great UX improvement, as some gitfast/git-bare tasks can be really long.


Migrated from T3550 (view on Phabricator)

Edited by Phabricator Migration user
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information