- Dec 12, 2022
- Dec 08, 2022
-
-
Antoine Lambert authored
Now that SvnRepo.propget supports URL as target, we can remove the use of costly checkout operation and directly retrieve the whole set of svn:externals properties. This should greatly improve incremental loading of a big repository in terms of performance.
-
Antoine Lambert authored
subvertpy 0.11 has a buggy implementation of propget bindings when target is an URL (https://github.com/jelmer/subvertpy/issues/35), so as a workaround we implement propget for URL using non buggy proplist bindings.
-
Antoine Lambert authored
Retrying three times is enough as we use expontential backoff. Previously the loader could be stuck more than twenty minutes in a row when it encounters a dead external, now it would be a couple of minutes.
-
- Dec 07, 2022
-
-
Antoine Lambert authored
Copied directories might have externals so we also need to copy states and update external paths in case externals list is later modified.
-
Antoine Lambert authored
In order to detect all ascii characters that must be percent encoded in svn URLs, add a brute force test and use urllib.parse.quote in quote_svn_url function.
-
Antoine Lambert authored
Such case can happen when an external definition is malformed. Previously, the parsed malformed external was added to the directory state with an empty external URL which could lead to unexpected side effects like removing all previously exported valid externals.
-
Antoine Lambert authored
Instead of maintaining file state based on svn properties across revisions replay and trying to reconstruct the same file as with a svn export operation after applying text deltas, prefer to simply export the file from the currently processed revision when closing the associated file editor. This greatly simplify the replay module implementation while approximatively keeping the same performance as before. Also add a test that would fail without these changes. Related to T4673
-
- Dec 05, 2022
-
-
Antoine Lambert authored
When copying a directory from an ancestor revision, do not ignore externals as properties are also copied by subversion so external paths must also be exported.
-
Antoine Lambert authored
In debug mode, when a hash tree computation divergence is detected after replaying a revision, compute and display the diff between contents to facilitate debugging of those type of issues.
-
- Nov 25, 2022
-
-
Antoine Lambert authored
Add more debug logs to the replay module to ease detection of issues. Nevertheless, as those are quite verbose, only display them when setting debug parameter of the loader to True.
-
- Nov 23, 2022
-
-
Antoine Lambert authored
-
- Nov 22, 2022
-
-
Antoine Lambert authored
When a tree computation divergence is detected after replaying a revision add debug logs displaying the paths that differ or are missing between the reconstructed repository filesystem and the exported one at that specific revision. It should help to gain some time when debugging such issues.
-
- Oct 31, 2022
-
-
Antoine Lambert authored
A subversion revision can contain new directories and files copied from ancestor revisions but those were not perfectly handled in the commit editor used to reconstruct the repository filesystem when replaying revisions. In particular previous implementation could not handle the case where a path copied from an ancestor revision is replaced in a same commit (for instance replacing a directory by a file with the same name). These changes ensure that info about source path and source revision from which a path is copied is passed to the commit editor methods as paramaters in order to let them handle the copies but also that the replace operations will be correctly replayed. It also prevents OS error "Too many open files" when a really large files tree is copied from an ancestor revision.
-
Antoine Lambert authored
When dumping a subversion repository to file before loading it, compress that file using gzip while producing it. It enables to save significant disk space while dumping a large repository. Also rework the way how truncated dump is handled now dump file is compressed by providing the expected max revision number to be loaded by svnadmin. If the number of loaded revisions matches, we can safely continue the partial loading of the repository.
- Oct 28, 2022
-
-
Antoine Lambert authored
URLs provided as parameters to subvertpy.client.Client methods must be quoted when it contains space characters or an assertion will be raised by libsvn otherwise.
-
- Oct 25, 2022
-
-
Antoine Lambert authored
It exists some subtle cases in subversion repositories where external paths defined on different directories can overlap so update replay module implementation to handle those and avoid to erroneously remove paths when replaying revisions.
-
- Oct 19, 2022
-
-
Antoine Lambert authored
-
Antoine Lambert authored
When the "svnadmin load" command exits with error, report the svn admin error in the ValueError exception raised by the function init_svn_repo_from_dump. This should help debugging those type of issues reported by sentry.
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
Antoine Lambert authored
Use helper fixture loading_task_creation_for_listed_origin_test from swh-loader-core and remove redundant tests.
-
- Oct 17, 2022
-
-
Antoine Lambert authored
Instead of maintaining a set of modified paths for each replayed revision, use the swh.model.from_disk.Directory.collect method which performs the same task by returning added or modified contents and directories since the last collect operation.
-
- Oct 01, 2022
- Sep 30, 2022
-
-
Antoine Lambert authored
Some subversion servers in the wild disabled anonymous access and require to authenticate using a read only user account, typically with credentials anonymous/anonymous. So add support for basic authentication in the subversion loader, credentials must be provided in the repository URL as with basic HTTP authentication.
-
- Sep 15, 2022
- Jun 17, 2022
-
-
Antoine Lambert authored
An external definition can be of the following form (where XXX and YYY are revision numbers): -r XXX <repo_url>@YYY In that case, the official subversion client will export the revision XXX of the external repository. So ensure to have the same behavior when the subversion loader processes a repository with such external defintion in it.
-
- Jun 01, 2022
-
-
Antoine Lambert authored
Temporary network failures can also happen when using subversion remote access API so make single commit info retrieval operation retryable.
- May 20, 2022
-
-
Antoine Lambert authored
This type of error comes from temporary network failure so the failed svn operation can be retried.
-
- May 09, 2022
-
-
Pratyush authored
-
- May 05, 2022
-
-
Antoine Lambert authored
The computed root directory name had a trailing slash which was making its lookup in from_disk.Directory model fail and thus every archived revisions were targetting the empty directory.
-
- May 03, 2022
-
-
Antoine Lambert authored
svnrdump does not handle repository URL redirection while svn client does. So ensure to use redirected subversion origin URL to dump a repository. Related to T3874
-
- Apr 29, 2022
- Apr 27, 2022
-
-
Antoine Lambert authored
Using unnamed arguments is unsafe with celery tasks and parsing visit_date is still required in case that parameter is used.
-
- Apr 26, 2022
-
-
Antoine Lambert authored
Recent changes in swh-scheduler add new parameters to the celery tasks produced from swh.scheduler.model.ListedOrigin instances. Those new parameters were not properly handled in the celery tasks implementation for svn loading but also in some svn loader ones. So ensure to handle any new parameters by not hardcoding the expected ones in task signatures but also allowing any extra keyword parameters in loader constructors. Also remove explicit setting of visit_date if none has been provided as task parameter as it is already handled in loader constructor. Related to T4187
-
vlorentz authored
-
- Apr 21, 2022
-
-
Antoine Lambert authored
This makes the loader more resilient to temporary network failures.