Fix revision cooking errors with the vault for large revision log
This fixes a vault issue similar to a one that happened when cooking a directory (#1177 (closed)).
When cooking a gitfast archive for a given revision, fetching the associated revision log must be performed client-side in a paginated way to avoid storage timeouts when the total number of revisions to fetch is pretty large.
Related T1934
Migrated from D1778 (view on Phabricator)
Merge request reports
Activity
Build is green See https://jenkins.softwareheritage.org/job/DVAU/job/tox/84/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DVAU/job/tox/86/ for more details.
- swh/vault/cookers/utils.py 0 → 100644
13 Args: 14 storage (swh.storage.storage.Storage): instance of swh storage 15 (either local or remote) 16 rev_id (bytes): a revision identifier 17 per_page (Optional[int]): the maximum number of revisions to return 18 in each page 19 20 Yields: 21 dict: Revision information as a dictionary 22 """ 23 rw_state = {} 24 nb_revs = 0 25 max_revs = per_page 26 while True: 27 # Get an iterator returning the commits log from rev_id. 28 # At most max_revs visited revisions from rev_id in the commits graph Nope, it will work as expected.
get_revisions_walker
instantiates a revision iterator, the important part here is thestate
parameter enabling to continue the iteration from where you left it the last time you used it.As I can't guess the number of revisions to walk, I am forced to continually increment the
max_revs
parameter. But the number of revisions returned at each iteration step will beper_page
or less if we hit the initial revision in the log.
- swh/vault/cookers/utils.py 0 → 100644
18 in each page 19 20 Yields: 21 dict: Revision information as a dictionary 22 """ 23 rw_state = {} 24 nb_revs = 0 25 max_revs = per_page 26 while True: 27 # Get an iterator returning the commits log from rev_id. 28 # At most max_revs visited revisions from rev_id in the commits graph 29 # will be returned. 30 revs_walker = get_revisions_walker('bfs', storage, rev_id, 31 max_revs=max_revs, 32 state=rw_state) 33 # Iterate on at most per_page revisions in the commits log. Some references in the commit message have been migrated:
- T1177 is now #1177 (closed)
Build is green See https://jenkins.softwareheritage.org/job/DVAU/job/tox/87/ for more details.