- Oct 17, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '0.5.1' with Debian dir fba11b26a65fa5caefd6aa21e85f19e12c2e798a
-
Antoine Lambert authored
This fixes debian package builds.
-
Jenkins for Software Heritage authored
Update to upstream version '0.5.0' with Debian dir 126ab15041de3c0fe8c1af8d037d69f8bc060353
-
Antoine Lambert authored
-
Antoine Lambert authored
Previously, after each revision replay all files and directories of the CVS repository being loaded were collected and sent to the storage. This is a real bottleneck in terms of loading performances as it delegates the filtering of new objects to archive to the storage filtering proxy. As we known exactly the set of paths that have been modified in a CVS revision, prefer to do that filtering on the loader side and only send modified objects to storage instead of the whole set of contents and directories from the reconstructed filesystem. This should greatly improve loading performance for large repositories but also reduce loader memory consumption.
-
Antoine Lambert authored
Instead of creating a from_disk.Directory instance after each replayed CVS revision by recursively scanning all directories of the repository, prefer to have a single one as class member kept synchronized with the recontructed filesystem after each revision replay. This should improve loader in terms of performance, especially when delaing with large repositories.
-
Antoine Lambert authored
CVS rlog for a given module sent by server is a concatenation of rlog entries. Each entry has a header containing the path to a RCS file plus other info. It exist cases where a rlog entry header is empty which makes the rlog parsing fail. So instead of stopping rlog parsing by raising an exception, prefer to skip that entry and process the next one. Closes T4629
- Oct 14, 2022
-
-
Antoine Lambert authored
Instead of using the readlines method on file objects that retrieve all lines of a file and store them in memory, prefer to read files line by line by using the lazy generator of lines from file objects. This significantly reduce loader memory consumption when processing a large rlog output stored in a file.
-
- Oct 13, 2022
-
-
Antoine Lambert authored
That case was handled when using rsync protocol but not when using pserver or ssh protocol. Closes T4631
-
Antoine Lambert authored
When attempting to fetch the rlog for a path that does not exist in the repository, the CVS server will respond with the following lines: E cvs rlog: could not read RCS file for <path> ok That error case was not handled in fetch_rlog so ensure it returns None when encountering it. The issue was spotted when the loader attempts to fetch more rlog data from Attic directories. The paths of these Attic directories are computed from those of the files in the repositories but it exist cases where those directories do not exist.
-
- Sep 19, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '0.4.1' with Debian dir 5179ae3f511f7227d4200d0f28b138ea25210440
- Sep 15, 2022
-
-
Antoine Lambert authored
Since Python 3.10, support for PyArg_ParseTuple() # formats requires PY_SSIZE_T_CLEAN macro to be defined.
-
Jenkins for Software Heritage authored
Update to upstream version '0.4.0' with Debian dir 3a57a357af5c42f6cafbb304e6ecab6132ccfd05
-
- Jul 11, 2022
-
-
Antoine Lambert authored
It exists CVS respositories where revision numbers greather than 1.x are used to version files. Previous loader implementation was raising an error when encountering such kind of revision so ensure it will be processed as the other ones. Also fix tag names extraction from rlog output. Related to T4043
-
- Jul 08, 2022
-
-
Antoine Lambert authored
It exists cases where rsync output will not be ascii decodable so prefer to use utf-8 instead.
-
- Jul 07, 2022
-
-
Antoine Lambert authored
It makes the loading process fail otherwise.
-
- Jul 06, 2022
-
-
Antoine Lambert authored
Some CVS repositories have paths which are non valid UTF-8 (typically ISO-8859-1 ones) but the loader implementation assumed all paths can be safely encoded to UTF-8 and was raising UnicodeEncodeError when attempting to encode non UTF-8 paths. That commit modifies the way CVS paths are handled by the loader by using their raw bytes representation instead of their UTF-8 decoded string representation. Also rcsparse.rcsfile constructor has been modified to take bytes path as argument instead of an unicode one in order to be able to successfully open non UTF-8 paths. Such CVS repositories can now be successfully loaded, either using rsync or pserver protocol. Related to T3980
-
- Jun 17, 2022
-
-
Antoine Lambert authored
Connection to an existing pserver might sometimes fail (on SourceForge for instance), retrying the operation usally fixes the issue.
-
Antoine Lambert authored
Some CVS servers (SourceForge and OSDN for instance) return an error if the path sent with the "Directory" pserver request is not absolute. So fix that issue to ensure loading of such CVS repositories.
-
Antoine Lambert authored
The CVS client was raising an error when trying to connect to such pserver URL: pserver://anonymous@cvs.example.org/cvsroot/project/module But numerous CVS pserver URLs that can be found in the wild (notably on SourceForge and OSDN) are in that form. So add support for such URL form in the CVS client. Also remove use of external dependency urllib3.util.parse_url and prefer to use urllib.parse.urlparse from standard Python library.
-
- May 20, 2022
-
-
Antoine Lambert authored
-
- May 11, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '0.3.0' with Debian dir a838aee4a15705419ac1e8872dad306b4c52120d
- May 10, 2022
-
-
Antoine R. Dumont authored
A change in the base loader will allow to increase it punctually if needed in debugging mode.
-
- May 09, 2022
-
-
Pratyush authored
-