- May 26, 2023
-
-
Antoine Lambert authored
Add jenkins badge for master branch build status. Rephrase introduction sentence. Remove remainings from when the file was written in restructuredText. Add syntax highlighting to code blocks.
-
Antoine Lambert authored
The export_temporary method of the SvnRepo class exports the content of a subversion repository at a given revision in a temporary directory. As we also export the externals that might be associated to some paths in the repository, we first need to get all the svn:externals property values in order to determine if there is recursive or relative externals and adjust some export parameters accordingly. While that operation is fast when the subversion repository is hosted locally, it is terribly slow when the repository is hosted on a remote server. Indeed a recursive propget operation on a remote server sends a lot of network requests which slows down quite a lot the process, especially with large repositories. To improve the performances, the previous implementation was doing a full checkout of the repository to local filesystem and gets svn:externals property values from it. Nevertheless, that process is time consuming for large repositories and it can consume a lot of disk space. In order to remove that bottleneck and improve overall performances for getting all properties values, introduce a C++ extension module for Python that implements a fast way to crawl all paths of a repository and their associated properties. Unlike "svn ls --depth infinity" or "svn propget -R" commands it performs only one SVN request over the network, hence saving time especially with large repositories. The code is freely inspired from the fast-svn-crawler project by Dmitry Pavlenko (https://sourceforge.net/projects/fastsvncrawler/). The obtained speedup is quite impressive, on a large remote repository listing all paths using "svn ls --depth infinity" or gettings all svn:externals property values using "svn propget -R" takes around one hour while it takes only a couple of minutes using the approach implemented in the C++ extension module. Using that approach also enables to save disk space as we no longer need to perform a full checkout of the repository. This change should greatly improve the performances when reloading a svn repository already visited by Software Heritage. Indeed, before the possible archiving of new commits issued since last visit, the loader checks that a repository has not been altered by calling the export_temporary method using the remote repository URL.
-
- May 23, 2023
-
-
Antoine Lambert authored
Some external definitions can have leading or trailing spaces/tabs so we need to strip them to avoid parsing errors. Fixes #4734
-
- May 03, 2023
-
-
Antoine Lambert authored
Official subversion documentation only mentions that paths containing spaces must be surrounded by double quotes but we can find some external definitions in the wild whose paths are surrounded by single quotes. Those are properly handled by the official subversion client so we must do the same when parsing externals.
-
- Apr 17, 2023
-
-
Antoine Lambert authored
When a directory is copied from another one in a previous revision, externals must be copied only if they have been defined in a revision greater or equal to the revision the directory is copied from. So store the revision number an external is defined and use it to filter externals when performing copyfrom operations.
-
- Apr 04, 2023
-
-
Antoine Lambert authored
As the swh.loader.svn.svn_repo.SvnRepo class quote all URLs before calling subversion API through subvertpy, me must ensure to unquote URLs extracted from external definitions otherwise they will be double quoted and thus no longer valid.
-
Antoine Lambert authored
When a directory is deleted by subversion, we must also remove its state holding externals info as the directory can be re-added later in another revision but without the svn:externals property set.
-
- Mar 02, 2023
-
-
Antoine Lambert authored
Previously SvnLoaderFromRemoteDump class was using the repository root URL to dump a sub-project.
-
- Mar 01, 2023
-
-
Antoine Lambert authored
"svnadmin load" has a --no-flush-to-disk option enabling faster load while being unsafe on power off. This drawback is not an issue for the subversion loader so use that option to significantly improve the performance for loading a repository from a dump file into a directory on the local filesystem.
-
Antoine Lambert authored
Those methods ensure URLs are properly quoted to avoid assertion failures when calling functions from the subversion C API.
-
Antoine Lambert authored
In order to ensure consistency between SvnLoader and SvnLoaderFromRemoteDump classes, run most of the tests with both of them. As a consequence, fix an invalid load status that was reported by the SvnLoader class when no new objects to archive have been found during a visit.
-
Antoine Lambert authored
That kind of error can be encoutered when loading a repository hosted on SouceForge.
-
Antoine Lambert authored
Recursive propget operation is terribly slow over the network, better doing it from a freshly checked out working copy as it is faster.
-
Antoine Lambert authored
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
- Feb 02, 2023
-
-
Antoine Lambert authored
Previously the parse_external_definition function was returning a single revision number regardless it was a revision specified with -rX or a peg revision specified with @X. However, the use and combination of these two parameters in the export command from subversion can lead to different results (see https://svnbook.red-bean.com/en/1.6/svn.advanced.pegrevs.html). So ensure to extract both revision and peg revision in order to avoid different behavior from the official subversion client when the loader exports externals.
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
- Jan 20, 2023
-
-
Antoine Lambert authored
Subversion allow to specify the revision for an external definition as a date instead of an integer identifier. So when encountering such case, get the HEAD revision number for the external at the specified date in order to export the correct version of the files targeted by the external. Related to #4727
-
Antoine Lambert authored
It enables to get the HEAD revision number for a repository at a specific date. Related to #4727
-
- Jan 18, 2023
-
-
Antoine Lambert authored
It prevents "Remote access object already in use" errors.
-
Antoine Lambert authored
Those are simply wrappers around functions from the converters module are are nou used elsewhere.
-
Antoine Lambert authored
Add some default parameters to SvnRepo class constructor in order to simplify initialization of such object for standalone use. Make origin_url parameter of info method optional. Add some tests for the SvnRepo class.
-
- Jan 17, 2023
-
-
Antoine Lambert authored
This is a more meaninful name considering that module only contains a single class named SvnRepo.
-
- Dec 19, 2022
-
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Also include them the proper sphinx way. Related to T4496
-
- Dec 08, 2022
-
-
Antoine Lambert authored
Now that SvnRepo.propget supports URL as target, we can remove the use of costly checkout operation and directly retrieve the whole set of svn:externals properties. This should greatly improve incremental loading of a big repository in terms of performance.
-
Antoine Lambert authored
subvertpy 0.11 has a buggy implementation of propget bindings when target is an URL (https://github.com/jelmer/subvertpy/issues/35), so as a workaround we implement propget for URL using non buggy proplist bindings.
-
Antoine Lambert authored
Retrying three times is enough as we use expontential backoff. Previously the loader could be stuck more than twenty minutes in a row when it encounters a dead external, now it would be a couple of minutes.
-
- Dec 07, 2022
-
-
Antoine Lambert authored
Copied directories might have externals so we also need to copy states and update external paths in case externals list is later modified.
-
Antoine Lambert authored
In order to detect all ascii characters that must be percent encoded in svn URLs, add a brute force test and use urllib.parse.quote in quote_svn_url function.
-
Antoine Lambert authored
Such case can happen when an external definition is malformed. Previously, the parsed malformed external was added to the directory state with an empty external URL which could lead to unexpected side effects like removing all previously exported valid externals.
-
Antoine Lambert authored
Instead of maintaining file state based on svn properties across revisions replay and trying to reconstruct the same file as with a svn export operation after applying text deltas, prefer to simply export the file from the currently processed revision when closing the associated file editor. This greatly simplify the replay module implementation while approximatively keeping the same performance as before. Also add a test that would fail without these changes. Related to T4673
-
- Dec 05, 2022
-
-
Antoine Lambert authored
When copying a directory from an ancestor revision, do not ignore externals as properties are also copied by subversion so external paths must also be exported.
-
Antoine Lambert authored
In debug mode, when a hash tree computation divergence is detected after replaying a revision, compute and display the diff between contents to facilitate debugging of those type of issues.
-
- Nov 25, 2022
-
-
Antoine Lambert authored
Add more debug logs to the replay module to ease detection of issues. Nevertheless, as those are quite verbose, only display them when setting debug parameter of the loader to True.
-
- Nov 23, 2022
-
-
Antoine Lambert authored
-
- Nov 22, 2022
-
-
Antoine Lambert authored
When a tree computation divergence is detected after replaying a revision add debug logs displaying the paths that differ or are missing between the reconstructed repository filesystem and the exported one at that specific revision. It should help to gain some time when debugging such issues.
-
- Oct 31, 2022
-
-
Antoine Lambert authored
A subversion revision can contain new directories and files copied from ancestor revisions but those were not perfectly handled in the commit editor used to reconstruct the repository filesystem when replaying revisions. In particular previous implementation could not handle the case where a path copied from an ancestor revision is replaced in a same commit (for instance replacing a directory by a file with the same name). These changes ensure that info about source path and source revision from which a path is copied is passed to the commit editor methods as paramaters in order to let them handle the copies but also that the replace operations will be correctly replayed. It also prevents OS error "Too many open files" when a really large files tree is copied from an ancestor revision.
-
Antoine Lambert authored
When dumping a subversion repository to file before loading it, compress that file using gzip while producing it. It enables to save significant disk space while dumping a large repository. Also rework the way how truncated dump is handled now dump file is compressed by providing the expected max revision number to be loaded by svnadmin. If the number of loaded revisions matches, we can safely continue the partial loading of the repository.
-
- Oct 28, 2022
-
-
Antoine Lambert authored
URLs provided as parameters to subvertpy.client.Client methods must be quoted when it contains space characters or an assertion will be raised by libsvn otherwise.
-