- Dec 16, 2021
-
-
Antoine R. Dumont authored
The existing code was probably made out of the svn loader and got never changed. This drops the inexistant parameters and keep only the one needed. This also adds coverage to the module. Related to T3788
-
- Dec 13, 2021
-
-
Nicolas Dandrimont authored
-
- Dec 09, 2021
-
-
Stefan Sperling authored
Our expansion of the Log keyword was slightly wrong. We need to trim trailing whitespace from the "prefix" line content which preceeds the Log keyword when we write out line content which followed the Log keyword. Update the Log expansion example given in a comment to document this (see there for details; this behaviour of CVS is hard to explain without illustration). Found while testing conversion of the OpenBSD CVS repository. Add a new test which uses an RCS file from this repository to reproduce this problem.
-
Stefan Sperling authored
CVS supports the definition of custom keywords. A common use case for custom keywords is to use the project name as a keyword. This avoids confusion when files are copied between projects using CVS, in case files contain a keyword that is in use by both projects. In other words, a file will retain its expanded custom keyword from project A, allowing to trace the initial file version back to its origin, after the file was copied into project B's CVS repository. This feature is in active use by OpenBSD and NetBSD, for example. Existing conversions of their CVS repositories to Git expand the corresponding custom keywords as well, and so should we. Historically, X11 and FreeBSD were also using custom keywords. During conversion via rsync:// we copy the CVSROOT directory and the desired CVS module from the rsync server. The file CVSROOT/config contains directives which configure the use of custom keywords. Parse this file and expand keywords accordingly when checking out versions of files from our local copy of the CVS repository. For now, we only support custom keywords which correspond to the Id keyword since this is known to be in common use by projects. The latest releases of CVS (1.12.x) have optional support for arbitrary keyword aliases via custom keywords. Support for this could be added later, should there be a need to do so. In any case, the pserver access method already supports arbitrary custom keywords because such keywords will be expanded by the CVS server when we check out files from it. While here, optimize our use of rsync a bit. Fetch only CVSROOT and the desired CVS module over rsync, rather than fetching the entire CVS repository directory, which may contain unrelated CVS modules that require disk space but will not be used.
-
- Dec 08, 2021
-
-
Stefan Sperling authored
CVS modules were imported with the a top-level directory which matched the module name. For a CVS origin such as rsync://cvs.savannah.gnu.org/sources/dino/dino the top-level directory contained a single directory called "dino" with all expected files and directories residing inside this directory. E.g. the dino project's top-level README file would be stored at the path "dino/README" instead of just "/README". Import project files directly into the top-level directory, as expected. Adjust test expectations accordingly.
-
- Dec 07, 2021
-
-
Stefan Sperling authored
Mention that cvs is a required dependency for running the tests. Document that some protocol schemes are not fully covered by the test suite (as suggested by vlorentz in D6678).
-
Stefan Sperling authored
The CVS loader used to create one snapshot per loaded revision. As pointed out by ardumont in D6745, this is wrong; Other loaders create only one snapshot per visit. Fix this issue and adjust tests expectations accordingly. While here, show SHW IDs of loaded revisions and snapshots in regular "info" log output, rather than only in "debug" log output. Previously, only CVS-related data was shown at the "info" log level. Showing both CVs and SWH data in log output is more informative.
-
Stefan Sperling authored
Align our expansion of Log keywords with the behaviour of a real CVS server. With this, such keywords expand the same way over the pserver and rsync access methods. This is the last change required to consistently ingest CVS's own CVS repository over both pserver and rsync. Otherwise we get commit hash mis-matches due to differently expanded Log keywords.
-
- Dec 04, 2021
-
-
Stefan Sperling authored
Summary: Suggested by ardumont in D6566 Reviewers: #reviewers, vlorentz Reviewed By: #reviewers, vlorentz Differential Revision: https://forge.softwareheritage.org/D6585
-
- Nov 29, 2021
-
-
Stefan Sperling authored
The function RcsKeywords.expand_keyword() is used to expand keywords when fetching an origin over rsync. This function failed to process multiple keywords on a single line, even though the existing code already keeps looping in an attempt to expand multiple keywords. For example, consider this line from a file in the ccvs CVS repository: #ident "@(#)cvs/contrib/pcl-cvs:$Name: $Id$" Here, a regular CVS server expands both keywords on this line. The Name keyword is special; It expands only if an explicit tag name was given on the CVS command line. This keyword always expands to an empty string for now, until perhaps one day the CVS loader learns about tags. Our regular expression which attempts to match keywords on a line splits the above example into two match groups: 1: #ident "@(#)cvs/contrib/pcl-cvs:$Name: $ 2: Id$ The Name keyword was then expanded as expected, but the Id keyword was missed. To fix this, attempt another match starting from the terminating character of the previous match, such that we match the following two strings: 1: #ident "@(#)cvs/contrib/pcl-cvs:$Name: $ 2: $Id$ Now our CVS loader expands both keywords like the CVS server does. Add new test data to confirm that this works as intended.
-
- Nov 26, 2021
-
-
Stefan Sperling authored
-
- Nov 23, 2021
-
-
Stefan Sperling authored
Some RCS keywords, such has "Header", contain absolute file paths derived from the on-disk filesystem path of the CVS repository. When we fetch files over the pserver protocol such keywords are expanded by the CVS server. But when using the rsync protocol we will first copy the CVS repository to local disk and the path to this local copy will correspond to some temporary directory. Try to avoid file content differences between pserver and rsync access methods by deriving a likely server-side path from path information found in the rsync:// origin URL. This will work as expected as long as the CVS server-side setup exposes the same path to the CVS repository over both access methods, which is the case for GNU savannah for example. In general, we should recommend treating pserver and rsync as distinct origins and not rely on them to be interchangable and always produce the same conversion result. But we can still try our best to avoid needless differences in content hashes.
-
Antoine R. Dumont authored
This fixes build [1] [1] https://jenkins.softwareheritage.org/view/swh-draft/job/DLDSVN/job/tests/1304/console
-
- Nov 11, 2021
-
-
Stefan Sperling authored
Empty lines sent by the CVS server in rlog output were being stripped by our custom cvs client implementation. Unfortunately, this resulted in empty lines being stripped from CVS log messages, which is fixed with this commit. The rsync access method already preserved log messages properly, and now the pserver access method does the same.
-
- Nov 09, 2021
-
-
Stefan Sperling authored
Newer CVS clients tag commits with a commit ID which allows us to correctly convert commits which changed several RCS files at once. The rsync access method based on cvs2gitdump was already taking advantage of this. To ensure that conversions over the pserver protocol yield the same result as conversions over rsync we need to add commit ID support to rlog.py. Add two new test cases which convert the same repository over rsync and pserver respectively, and ensure that they yield the same result. Without commit ID support conversion over pserver produces a different result for this particular test repository. With feedback about coding style from vlorentz.
-
Stefan Sperling authored
CVS repositories may contain RCS history in file,v as well as a corresponding Attic/file,v where each file contains separate events that occurred in history. The Attic version of the file results from file deletion events. The rsync access method already uses history found in the Attic. However, a CVS server will only return RCS files from the Attic if we request them explicitly. If we do not request them then our converted history may end up missing deletion events for some files. Unfortunately, we cannot tell which RCS files have a corresponding file in the Attic, so we need to search all Attic directories by running the equivalent of 'cvs rlog' in each directory. This slows down pserver access considerably (and it was already quite slow compared to rsync). But we need to pay this price in order to obtain a valid conversion result. This patch contains related fixes to cvsroot path handling, which was broken for the pserver case. Without these fixes we cannot create the correct paths for Attic directories to search. Problem found while comparing conversion results of rsync and pserver access methods for the GNU dino CVS repository at cvs.savannah.gnu.org/sources/dino Add two new test cases based on RCS files from this repository. Without this fix in place history would diverge at this commit: 8891a63 | larsl | Removed the MIDIEvent class | 04 May 2006, 01:11 UTC Because the files midievent.cpp and midievent.hpp would not get deleted when converting this commit via the pserver protocol.
-
Stefan Sperling authored
Make an existing test case run over pserver as well. This access method uses a different way of detecting file additions and deletions and should be tested separately. Add new tests to cover the re-addition of a file after it was deleted.
-
Stefan Sperling authored
-
Stefan Sperling authored
We can simply ask the CVS server to expand keywords for us, instead of forcing binary file mode with the -kb option. The CVS repository contains per-file keyword expansion defaults the server will use. Files checked out by cvsclient.py should now match what a regular CVS client would check out by default. Add test cases which verify that we create the same snapshot ID for a repository which uses the Id keyword in a file, regardless of whether this repository is accessed via rsync or pserver.
-
- Nov 05, 2021
-
-
vlorentz authored
-
- Nov 03, 2021
- Oct 27, 2021
-
-
Stefan Sperling authored
This test reproduces the bug fixed in commit d3b3344b where our custom cvs client would fail to check out a file which lacks a trailing newline from a remote CVS server. The error triggered by the test without the fix in place is: CVSProtocolError: Overlong response from CVS server: b'delta with no trailing eolok\n'
-
Stefan Sperling authored
The CVS commit ID is an optional attribute which is only generated by relatively recent releases of CVS clients. Our rlog parser was skipping such commits because it failed to match on them due to an error in a regular expression. This resulted in an incomplete import of CVS revision history. Here is a sample line from cvs rlog output which carries a commit ID and was not matched because the regex lacked the trailing semicolon: date: 2007-07-17 15:02:50 +0200; author: larsl; state: Exp; lines: +619 -285; commitid: oju0x8tTc9aUB7qs; Found while testing ingestion of the GNU dino repository from cvs.sannah.gnu.org/sources/dino
-
Stefan Sperling authored
The rlog parser was only fetching a single file revision because some lines of code had the wrong indentation. These lines were supposed to be part of a loop body but were only executed once. Also rename a function which had a misleading name and docstring. This function does in fact process the entire RCS revision history of a given file, as opposed to just one entry of RCS revision history. Found while testing ingestion of the GNU dino repository from cvs.savannah.gnu.org/sources/dino
-
Stefan Sperling authored
-
Stefan Sperling authored
While checking out files the server sends messages to the CVS client which provide information about the state of file paths. Our custom CVS client implementation needs to recognize a few additional responses the server may send while checking out a different version of a file which was already checked earlier. Otherwise our client will error out. We can simply ignore the messages (and its 2 paths arguments separated by \n) because we do not manage an actual CVS working copy. Found while testing ingestion of the GNU dino repository at cvs.savannah.gnu.org/sources/dino
-
Stefan Sperling authored
CVS uses \n as a protocol message separator, which forces us to read protocol message line-by-line. File content sent by the server has a length known and is transmitted in bytes. The server appends a final "ok\n" message (or perhaps an error message) when it is done sending file contents. Properly handle the case where this final message gets buffered along with file contents and is not delimited from file contents by \n because the file lacks a trailing newline. Previously, the final protocol message ended up being written out to file contents in this case. Found while testing ingestion of the GNU dino CVS repository from cvs.savannah.gnu.org/sources/dino.
-
- Oct 04, 2021
-
-
Stefan Sperling authored
-
- Oct 01, 2021
-
-
Stefan Sperling authored
This table becomes unreadable after the black formatter inserts a newline between all the table entries.
-
Stefan Sperling authored
Testing against cvs.savannah.gnu.org revealed that the CVS module name should not be included in the authentication request.
-
Stefan Sperling authored
The CVS loader's URL argument is mandatory so it should not be marked as optional.
-
- Sep 22, 2021
-
-
Stefan Sperling authored
Factor out code which is specific to rcsparse and cvsclient into separate functions and pass a parameter to process_cvs_changesets() so it can decide which of the two needs to be used. This supersedes the function process_cvs_rlog_changesets() which duplicated the looping code also contained in process_cvs_changesets().
-
- Sep 21, 2021
-
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
They are in this location when installing in development mode.
-
vlorentz authored
eg. self.log.exception already include the exception (and its traceback), so it does not need to be part of the message.
-
vlorentz authored
Even if the revision was already loaded; it might have been for a different origin.
-
- Sep 17, 2021
-
-
Stefan Sperling authored
-
Stefan Sperling authored
Suggested by vlorentz and zack.
-