Skip to content
Snippets Groups Projects

Fix revision cooking errors with the vault for large revision log

2 unresolved threads

This fixes a vault issue similar to a one that happened when cooking a directory (#1177 (closed)).

When cooking a gitfast archive for a given revision, fetching the associated revision log must be performed client-side in a paginated way to avoid storage timeouts when the total number of revisions to fetch is pretty large.

Related T1934


Migrated from D1778 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
13 Args:
14 storage (swh.storage.storage.Storage): instance of swh storage
15 (either local or remote)
16 rev_id (bytes): a revision identifier
17 per_page (Optional[int]): the maximum number of revisions to return
18 in each page
19
20 Yields:
21 dict: Revision information as a dictionary
22 """
23 rw_state = {}
24 nb_revs = 0
25 max_revs = per_page
26 while True:
27 # Get an iterator returning the commits log from rev_id.
28 # At most max_revs visited revisions from rev_id in the commits graph
  • Wouldn't that bubble up (since you increment it below) when the revisions are numerous?

    I'd call that function with max_revs=per_page.

    What did i miss?

  • Author Maintainer

    Nope, it will work as expected.

    get_revisions_walker instantiates a revision iterator, the important part here is the state parameter enabling to continue the iteration from where you left it the last time you used it.

    As I can't guess the number of revisions to walk, I am forced to continually increment the max_revs parameter. But the number of revisions returned at each iteration step will be per_page or less if we hit the initial revision in the log.

  • Please register or sign in to reply
  • For the sake of my question ;)

    Other than that sounds good.

  • Merge request was returned for changes

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • 18 in each page
    19
    20 Yields:
    21 dict: Revision information as a dictionary
    22 """
    23 rw_state = {}
    24 nb_revs = 0
    25 max_revs = per_page
    26 while True:
    27 # Get an iterator returning the commits log from rev_id.
    28 # At most max_revs visited revisions from rev_id in the commits graph
    29 # will be returned.
    30 revs_walker = get_revisions_walker('bfs', storage, rev_id,
    31 max_revs=max_revs,
    32 state=rw_state)
    33 # Iterate on at most per_page revisions in the commits log.
  • I'm not against this diff as quick solution for the problem, but I'm disappointed that this pagination stuff is not properly managed in the (client part) storage itself. What the point of having a client implementation of the storage if we do not abstract these kind of 'basic' behaviors?

  • Author Maintainer

    Update: Add comments in revision_log implementation

  • Author Maintainer

    Merge request was merged

  • Please register or sign in to reply
    Loading