- Jun 09, 2023
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
This also adds a commit to fix the svn export related task and loader inconsistently named. Refs. swh/infra/sysadm-environment#4906
-
- Jun 05, 2023
-
-
Antoine R. Dumont authored
Otherwise, we'd lose the context in the snapshot. Refs. swh/meta#4979
-
- Jun 01, 2023
-
-
Antoine R. Dumont authored
Refs. swh/meta#4979
-
Antoine R. Dumont authored
Refs. swh/meta#4979
-
- May 31, 2023
-
-
Antoine Lambert authored
Some remote subversion repositories in the wild require to use anonymous credentials to obtain a read access. Those credentials can be either 'anonymous/' or 'anonymous/anonymous' in most of the cases. Previously, it was required to add the credentials in the origin URL using basic authentication syntax. While such URL can be provided to the Save Code Now service, it cannot when svn URLs are coming from a lister. To workaround this, try to connect using these anonymous credentials in the get_svn_repo function when a connection error happens. This should also simplify the submission of Save Code Now requests when submitting a subversion origin that requires anonymous credentials.
-
Antoine Lambert authored
Previously when exporting a sub-path of a remote subversion repository over the network, the full repository was exported and the local path targeting the sub-path was returned. This is no really optimal in terms of network bandwidth if the repository filesystem is large but it was implemented like this to ensure all tests related to sub-paths export were passing regardless the subversion loader class used: either SvnLoader or SvnLoaderFromRemoteDump. After some analysis, it turned out that it was possible to avoid exporting the full repository but only the request sub-path when using the SvnLoader class. So modify the SvnRepo class to ensure that behavior and save some network bandwdith when dealing with a large repository. These changes in the SvnRepo class induce some in the replay module to ensure all tests still pass and it also enables to remove a no longer needed optional parameter to the class constructor.
-
- May 30, 2023
-
-
Antoine R. Dumont authored
This overrides the default core class DirectoryLoader's fetch_data method to export an svn tree at a specific commit or tag. The remaining behavior is the same as the BaseDirectoryLoader. Checks the checksums of the tree is ok, then ingest the DAG objects into the archives (including the NAR as extid). Refs. swh/meta#4979
-
- May 26, 2023
-
-
Antoine Lambert authored
Add jenkins badge for master branch build status. Rephrase introduction sentence. Remove remainings from when the file was written in restructuredText. Add syntax highlighting to code blocks.
-
Antoine Lambert authored
The export_temporary method of the SvnRepo class exports the content of a subversion repository at a given revision in a temporary directory. As we also export the externals that might be associated to some paths in the repository, we first need to get all the svn:externals property values in order to determine if there is recursive or relative externals and adjust some export parameters accordingly. While that operation is fast when the subversion repository is hosted locally, it is terribly slow when the repository is hosted on a remote server. Indeed a recursive propget operation on a remote server sends a lot of network requests which slows down quite a lot the process, especially with large repositories. To improve the performances, the previous implementation was doing a full checkout of the repository to local filesystem and gets svn:externals property values from it. Nevertheless, that process is time consuming for large repositories and it can consume a lot of disk space. In order to remove that bottleneck and improve overall performances for getting all properties values, introduce a C++ extension module for Python that implements a fast way to crawl all paths of a repository and their associated properties. Unlike "svn ls --depth infinity" or "svn propget -R" commands it performs only one SVN request over the network, hence saving time especially with large repositories. The code is freely inspired from the fast-svn-crawler project by Dmitry Pavlenko (https://sourceforge.net/projects/fastsvncrawler/). The obtained speedup is quite impressive, on a large remote repository listing all paths using "svn ls --depth infinity" or gettings all svn:externals property values using "svn propget -R" takes around one hour while it takes only a couple of minutes using the approach implemented in the C++ extension module. Using that approach also enables to save disk space as we no longer need to perform a full checkout of the repository. This change should greatly improve the performances when reloading a svn repository already visited by Software Heritage. Indeed, before the possible archiving of new commits issued since last visit, the loader checks that a repository has not been altered by calling the export_temporary method using the remote repository URL.
-
- May 23, 2023
-
-
Antoine Lambert authored
Some external definitions can have leading or trailing spaces/tabs so we need to strip them to avoid parsing errors. Fixes #4734
-
- May 03, 2023
-
-
Antoine Lambert authored
Official subversion documentation only mentions that paths containing spaces must be surrounded by double quotes but we can find some external definitions in the wild whose paths are surrounded by single quotes. Those are properly handled by the official subversion client so we must do the same when parsing externals.
-
- Apr 17, 2023
-
-
Antoine Lambert authored
When a directory is copied from another one in a previous revision, externals must be copied only if they have been defined in a revision greater or equal to the revision the directory is copied from. So store the revision number an external is defined and use it to filter externals when performing copyfrom operations.
-
- Apr 04, 2023
-
-
Antoine Lambert authored
As the swh.loader.svn.svn_repo.SvnRepo class quote all URLs before calling subversion API through subvertpy, me must ensure to unquote URLs extracted from external definitions otherwise they will be double quoted and thus no longer valid.
-
Antoine Lambert authored
When a directory is deleted by subversion, we must also remove its state holding externals info as the directory can be re-added later in another revision but without the svn:externals property set.
-
- Mar 02, 2023
-
-
Antoine Lambert authored
Previously SvnLoaderFromRemoteDump class was using the repository root URL to dump a sub-project.
-
- Mar 01, 2023
-
-
Antoine Lambert authored
"svnadmin load" has a --no-flush-to-disk option enabling faster load while being unsafe on power off. This drawback is not an issue for the subversion loader so use that option to significantly improve the performance for loading a repository from a dump file into a directory on the local filesystem.
-
Antoine Lambert authored
Those methods ensure URLs are properly quoted to avoid assertion failures when calling functions from the subversion C API.
-
Antoine Lambert authored
In order to ensure consistency between SvnLoader and SvnLoaderFromRemoteDump classes, run most of the tests with both of them. As a consequence, fix an invalid load status that was reported by the SvnLoader class when no new objects to archive have been found during a visit.
-
Antoine Lambert authored
That kind of error can be encoutered when loading a repository hosted on SouceForge.
-
Antoine Lambert authored
Recursive propget operation is terribly slow over the network, better doing it from a freshly checked out working copy as it is faster.
-
Antoine Lambert authored
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
- Feb 02, 2023
-
-
Antoine Lambert authored
Previously the parse_external_definition function was returning a single revision number regardless it was a revision specified with -rX or a peg revision specified with @X. However, the use and combination of these two parameters in the export command from subversion can lead to different results (see https://svnbook.red-bean.com/en/1.6/svn.advanced.pegrevs.html). So ensure to extract both revision and peg revision in order to avoid different behavior from the official subversion client when the loader exports externals.
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
- Jan 20, 2023
-
-
Antoine Lambert authored
Subversion allow to specify the revision for an external definition as a date instead of an integer identifier. So when encountering such case, get the HEAD revision number for the external at the specified date in order to export the correct version of the files targeted by the external. Related to #4727
-
Antoine Lambert authored
It enables to get the HEAD revision number for a repository at a specific date. Related to #4727
-
- Jan 18, 2023
-
-
Antoine Lambert authored
It prevents "Remote access object already in use" errors.
-
Antoine Lambert authored
Those are simply wrappers around functions from the converters module are are nou used elsewhere.
-
Antoine Lambert authored
Add some default parameters to SvnRepo class constructor in order to simplify initialization of such object for standalone use. Make origin_url parameter of info method optional. Add some tests for the SvnRepo class.
-
- Jan 17, 2023
-
-
Antoine Lambert authored
This is a more meaninful name considering that module only contains a single class named SvnRepo.
-
- Dec 19, 2022
-
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Also include them the proper sphinx way. Related to T4496
-
- Dec 08, 2022
-
-
Antoine Lambert authored
Now that SvnRepo.propget supports URL as target, we can remove the use of costly checkout operation and directly retrieve the whole set of svn:externals properties. This should greatly improve incremental loading of a big repository in terms of performance.
-
Antoine Lambert authored
subvertpy 0.11 has a buggy implementation of propget bindings when target is an URL (https://github.com/jelmer/subvertpy/issues/35), so as a workaround we implement propget for URL using non buggy proplist bindings.
-
Antoine Lambert authored
Retrying three times is enough as we use expontential backoff. Previously the loader could be stuck more than twenty minutes in a row when it encounters a dead external, now it would be a couple of minutes.
-
- Dec 07, 2022
-
-
Antoine Lambert authored
Copied directories might have externals so we also need to copy states and update external paths in case externals list is later modified.
-
Antoine Lambert authored
In order to detect all ascii characters that must be percent encoded in svn URLs, add a brute force test and use urllib.parse.quote in quote_svn_url function.
-
Antoine Lambert authored
Such case can happen when an external definition is malformed. Previously, the parsed malformed external was added to the directory state with an empty external URL which could lead to unexpected side effects like removing all previously exported valid externals.
-
Antoine Lambert authored
Instead of maintaining file state based on svn properties across revisions replay and trying to reconstruct the same file as with a svn export operation after applying text deltas, prefer to simply export the file from the currently processed revision when closing the associated file editor. This greatly simplify the replay module implementation while approximatively keeping the same performance as before. Also add a test that would fail without these changes. Related to T4673
-