- Dec 19, 2023
-
-
Antoine Lambert authored
Use the raw external definition to check parsing as test parameters identifier to get a more meaningful pytest output.
-
Antoine Lambert authored
When an external is defined using legacy format (svn < 1.5), the official subversion client automatically uses the peg_rev parameter of the export operation so ensure to have the same behavior in the loader to avoid hash mismatches in reconstructed file systems.
-
Antoine Lambert authored
It simplifies code, improves readability and facilitates the adding of new data related to a parsed external definition.
-
Antoine Lambert authored
When a path is copied using a copyfrom operation, externals set on the paths being copied must also be set on the copied paths. Previous implementation was using the latest externals values set on the paths being copied but those could differ from the ones set at revision copyfrom_rev so ensure to set correct externals on the copied paths.
-
- Dec 05, 2023
-
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Nov 28, 2023
-
-
Jérémy Bobbio (Lunar) authored
-
- Nov 22, 2023
-
-
Antoine Lambert authored
Renaming the celery task broke the subversion loading in production as the load-svn task type is already registered in scheduler database and targets the previous celery task name. Related to swh-environment#3925.
- Nov 20, 2023
-
- Nov 16, 2023
-
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 15, 2023
-
-
David Douard authored
-
- Oct 26, 2023
-
-
Antoine Lambert authored
It was broken in c834321d likely due to a side effect related to a local configuration.
-
- Oct 19, 2023
-
-
David Douard authored
This is what is actually done on production system (via custom configured scheduler task type definitions).
-
David Douard authored
-
- Jun 09, 2023
-
-
Antoine R. Dumont authored
This also adds a commit to fix the svn export related task and loader inconsistently named. Refs. swh/infra/sysadm-environment#4906
- Jun 05, 2023
-
-
Antoine R. Dumont authored
Otherwise, we'd lose the context in the snapshot. Refs. swh/meta#4979
-
- Jun 01, 2023
-
-
Antoine R. Dumont authored
Refs. swh/meta#4979
-
Antoine R. Dumont authored
Refs. swh/meta#4979
-
- May 31, 2023
-
-
Antoine Lambert authored
Some remote subversion repositories in the wild require to use anonymous credentials to obtain a read access. Those credentials can be either 'anonymous/' or 'anonymous/anonymous' in most of the cases. Previously, it was required to add the credentials in the origin URL using basic authentication syntax. While such URL can be provided to the Save Code Now service, it cannot when svn URLs are coming from a lister. To workaround this, try to connect using these anonymous credentials in the get_svn_repo function when a connection error happens. This should also simplify the submission of Save Code Now requests when submitting a subversion origin that requires anonymous credentials.
-
Antoine Lambert authored
Previously when exporting a sub-path of a remote subversion repository over the network, the full repository was exported and the local path targeting the sub-path was returned. This is no really optimal in terms of network bandwidth if the repository filesystem is large but it was implemented like this to ensure all tests related to sub-paths export were passing regardless the subversion loader class used: either SvnLoader or SvnLoaderFromRemoteDump. After some analysis, it turned out that it was possible to avoid exporting the full repository but only the request sub-path when using the SvnLoader class. So modify the SvnRepo class to ensure that behavior and save some network bandwdith when dealing with a large repository. These changes in the SvnRepo class induce some in the replay module to ensure all tests still pass and it also enables to remove a no longer needed optional parameter to the class constructor.
-
- May 30, 2023
-
-
Antoine R. Dumont authored
This overrides the default core class DirectoryLoader's fetch_data method to export an svn tree at a specific commit or tag. The remaining behavior is the same as the BaseDirectoryLoader. Checks the checksums of the tree is ok, then ingest the DAG objects into the archives (including the NAR as extid). Refs. swh/meta#4979
-
- May 26, 2023
-
-
Antoine Lambert authored
Add jenkins badge for master branch build status. Rephrase introduction sentence. Remove remainings from when the file was written in restructuredText. Add syntax highlighting to code blocks.
-
Antoine Lambert authored
The export_temporary method of the SvnRepo class exports the content of a subversion repository at a given revision in a temporary directory. As we also export the externals that might be associated to some paths in the repository, we first need to get all the svn:externals property values in order to determine if there is recursive or relative externals and adjust some export parameters accordingly. While that operation is fast when the subversion repository is hosted locally, it is terribly slow when the repository is hosted on a remote server. Indeed a recursive propget operation on a remote server sends a lot of network requests which slows down quite a lot the process, especially with large repositories. To improve the performances, the previous implementation was doing a full checkout of the repository to local filesystem and gets svn:externals property values from it. Nevertheless, that process is time consuming for large repositories and it can consume a lot of disk space. In order to remove that bottleneck and improve overall performances for getting all properties values, introduce a C++ extension module for Python that implements a fast way to crawl all paths of a repository and their associated properties. Unlike "svn ls --depth infinity" or "svn propget -R" commands it performs only one SVN request over the network, hence saving time especially with large repositories. The code is freely inspired from the fast-svn-crawler project by Dmitry Pavlenko (https://sourceforge.net/projects/fastsvncrawler/). The obtained speedup is quite impressive, on a large remote repository listing all paths using "svn ls --depth infinity" or gettings all svn:externals property values using "svn propget -R" takes around one hour while it takes only a couple of minutes using the approach implemented in the C++ extension module. Using that approach also enables to save disk space as we no longer need to perform a full checkout of the repository. This change should greatly improve the performances when reloading a svn repository already visited by Software Heritage. Indeed, before the possible archiving of new commits issued since last visit, the loader checks that a repository has not been altered by calling the export_temporary method using the remote repository URL.
-
- May 23, 2023
-
-
Antoine Lambert authored
Some external definitions can have leading or trailing spaces/tabs so we need to strip them to avoid parsing errors. Fixes #4734
-
- May 03, 2023
-
-
Antoine Lambert authored
Official subversion documentation only mentions that paths containing spaces must be surrounded by double quotes but we can find some external definitions in the wild whose paths are surrounded by single quotes. Those are properly handled by the official subversion client so we must do the same when parsing externals.
-
- Apr 17, 2023
-
-
Antoine Lambert authored
When a directory is copied from another one in a previous revision, externals must be copied only if they have been defined in a revision greater or equal to the revision the directory is copied from. So store the revision number an external is defined and use it to filter externals when performing copyfrom operations.
-
- Apr 04, 2023
-
-
Antoine Lambert authored
As the swh.loader.svn.svn_repo.SvnRepo class quote all URLs before calling subversion API through subvertpy, me must ensure to unquote URLs extracted from external definitions otherwise they will be double quoted and thus no longer valid.
-
Antoine Lambert authored
When a directory is deleted by subversion, we must also remove its state holding externals info as the directory can be re-added later in another revision but without the svn:externals property set.
-
- Mar 02, 2023
-
-
Antoine Lambert authored
Previously SvnLoaderFromRemoteDump class was using the repository root URL to dump a sub-project.
-
- Mar 01, 2023
-
-
Antoine Lambert authored
"svnadmin load" has a --no-flush-to-disk option enabling faster load while being unsafe on power off. This drawback is not an issue for the subversion loader so use that option to significantly improve the performance for loading a repository from a dump file into a directory on the local filesystem.
-
Antoine Lambert authored
Those methods ensure URLs are properly quoted to avoid assertion failures when calling functions from the subversion C API.
-
Antoine Lambert authored
In order to ensure consistency between SvnLoader and SvnLoaderFromRemoteDump classes, run most of the tests with both of them. As a consequence, fix an invalid load status that was reported by the SvnLoader class when no new objects to archive have been found during a visit.
-
Antoine Lambert authored
That kind of error can be encoutered when loading a repository hosted on SouceForge.
-
Antoine Lambert authored
Recursive propget operation is terribly slow over the network, better doing it from a freshly checked out working copy as it is faster.
-
Antoine Lambert authored
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-