Skip to content
Snippets Groups Projects
  1. Jan 07, 2022
  2. Jan 06, 2022
    • Stefan Sperling's avatar
      validate input paths in the CVS loader · 238c9c03
      Stefan Sperling authored
      The CVS loader creates files on the local file system based on
      paths which were read from a local copy of a CVS repository or
      sent by a CVS server as part of its "cvs rlog" response.
      
      Ensure that such paths will not be able to escape the temporary
      directory which stores checked out versions of files.
      v0.1.0
      238c9c03
  3. Dec 16, 2021
  4. Dec 15, 2021
  5. Dec 13, 2021
  6. Dec 09, 2021
    • Stefan Sperling's avatar
      fix Log keyword expansion with trailing whitespace in prefix · a66c6b49
      Stefan Sperling authored
      Our expansion of the Log keyword was slightly wrong. We need to
      trim trailing whitespace from the "prefix" line content which
      preceeds the Log keyword when we write out line content which
      followed the Log keyword. Update the Log expansion example given
      in a comment to document this (see there for details; this behaviour
      of CVS is hard to explain without illustration).
      
      Found while testing conversion of the OpenBSD CVS repository.
      Add a new test which uses an RCS file from this repository to
      reproduce this problem.
      a66c6b49
    • Stefan Sperling's avatar
      support custom keywords during rsync:// conversion · dcb895ca
      Stefan Sperling authored
      CVS supports the definition of custom keywords. A common use case
      for custom keywords is to use the project name as a keyword. This
      avoids confusion when files are copied between projects using CVS,
      in case files contain a keyword that is in use by both projects.
      In other words, a file will retain its expanded custom keyword from
      project A, allowing to trace the initial file version back to its
      origin, after the file was copied into project B's CVS repository.
      
      This feature is in active use by OpenBSD and NetBSD, for example.
      Existing conversions of their CVS repositories to Git expand
      the corresponding custom keywords as well, and so should we.
      Historically, X11 and FreeBSD were also using custom keywords.
      
      During conversion via rsync:// we copy the CVSROOT directory and the
      desired CVS module from the rsync server. The file CVSROOT/config
      contains directives which configure the use of custom keywords.
      Parse this file and expand keywords accordingly when checking out
      versions of files from our local copy of the CVS repository.
      
      For now, we only support custom keywords which correspond to the
      Id keyword since this is known to be in common use by projects.
      The latest releases of CVS (1.12.x) have optional support for arbitrary
      keyword aliases via custom keywords. Support for this could be added
      later, should there be a need to do so. In any case, the pserver access
      method already supports arbitrary custom keywords because such keywords
      will be expanded by the CVS server when we check out files from it.
      
      While here, optimize our use of rsync a bit.
      Fetch only CVSROOT and the desired CVS module over rsync, rather
      than fetching the entire CVS repository directory, which may contain
      unrelated CVS modules that require disk space but will not be used.
      dcb895ca
  7. Dec 08, 2021
    • Stefan Sperling's avatar
      fix the top-level directory path of imported CVS modules · 965629d6
      Stefan Sperling authored
      CVS modules were imported with the a top-level directory which
      matched the module name. For a CVS origin such as
      rsync://cvs.savannah.gnu.org/sources/dino/dino
      the top-level directory contained a single directory called "dino"
      with all expected files and directories residing inside this directory.
      E.g. the dino project's top-level README file would be stored at
      the path "dino/README" instead of just "/README".
      
      Import project files directly into the top-level directory, as expected.
      Adjust test expectations accordingly.
      965629d6
  8. Dec 07, 2021
    • Stefan Sperling's avatar
      update test suite documentation · 9e8f931e
      Stefan Sperling authored
      Mention that cvs is a required dependency for running the tests.
      
      Document that some protocol schemes are not fully covered by
      the test suite (as suggested by vlorentz in D6678).
      v0.0.1
      9e8f931e
    • Stefan Sperling's avatar
      make CVS loader create one snapshot per visit · 5298a8f9
      Stefan Sperling authored
      The CVS loader used to create one snapshot per loaded revision.
      As pointed out by ardumont in D6745, this is wrong; Other loaders
      create only one snapshot per visit.
      Fix this issue and adjust tests expectations accordingly.
      
      While here, show SHW IDs of loaded revisions and snapshots in regular
      "info" log output, rather than only in "debug" log output. Previously,
      only CVS-related data was shown at the "info" log level. Showing both
      CVs and SWH data in log output is more informative.
      5298a8f9
    • Stefan Sperling's avatar
      fix expansion of the Log keyword with rsync origins · 099959bb
      Stefan Sperling authored
      Align our expansion of Log keywords with the behaviour of a real
      CVS server. With this, such keywords expand the same way over
      the pserver and rsync access methods.
      
      This is the last change required to consistently ingest CVS's own
      CVS repository over both pserver and rsync. Otherwise we get commit
      hash mis-matches due to differently expanded Log keywords.
      099959bb
  9. Dec 04, 2021
  10. Nov 29, 2021
    • Stefan Sperling's avatar
      fix expansion of multiple RCS keywords on a line via rsync · 939dd546
      Stefan Sperling authored
      The function RcsKeywords.expand_keyword() is used to expand keywords
      when fetching an origin over rsync. This function failed to process
      multiple keywords on a single line, even though the existing code
      already keeps looping in an attempt to expand multiple keywords.
      
      For example, consider this line from a file in the ccvs CVS repository:
      
        #ident	"@(#)cvs/contrib/pcl-cvs:$Name:  $Id$"
      
      Here, a regular CVS server expands both keywords on this line.
      
      The Name keyword is special; It expands only if an explicit tag name was
      given on the CVS command line. This keyword always expands to an empty
      string for now, until perhaps one day the CVS loader learns about tags.
      
      Our regular expression which attempts to match keywords on a line splits
      the above example into two match groups:
      
        1: #ident	"@(#)cvs/contrib/pcl-cvs:$Name:  $
        2: Id$
      
      The Name keyword was then expanded as expected, but the Id keyword was missed.
      To fix this, attempt another match starting from the terminating character of
      the previous match, such that we match the following two strings:
      
        1: #ident	"@(#)cvs/contrib/pcl-cvs:$Name:  $
        2: $Id$
      
      Now our CVS loader expands both keywords like the CVS server does.
      Add new test data to confirm that this works as intended.
      939dd546
  11. Nov 26, 2021
  12. Nov 23, 2021
    • Stefan Sperling's avatar
      attempt to avoid content differences due to paths in keywords · 5539ccb6
      Stefan Sperling authored
      Some RCS keywords, such has "Header", contain absolute file paths
      derived from the on-disk filesystem path of the CVS repository.
      
      When we fetch files over the pserver protocol such keywords are
      expanded by the CVS server. But when using the rsync protocol we
      will first copy the CVS repository to local disk and the path to
      this local copy will correspond to some temporary directory.
      
      Try to avoid file content differences between pserver and rsync
      access methods by deriving a likely server-side path from path
      information found in the rsync:// origin URL.
      This will work as expected as long as the CVS server-side setup
      exposes the same path to the CVS repository over both access
      methods, which is the case for GNU savannah for example.
      
      In general, we should recommend treating pserver and rsync as distinct
      origins and not rely on them to be interchangable and always produce
      the same conversion result. But we can still try our best to avoid
      needless differences in content hashes.
      5539ccb6
    • Antoine R. Dumont's avatar
  13. Nov 11, 2021
    • Stefan Sperling's avatar
      preserve empty lines in CVS log messages over pserver · 34f46486
      Stefan Sperling authored
      Empty lines sent by the CVS server in rlog output were being stripped
      by our custom cvs client implementation. Unfortunately, this resulted
      in empty lines being stripped from CVS log messages, which is fixed
      with this commit. The rsync access method already preserved log
      messages properly, and now the pserver access method does the same.
      34f46486
  14. Nov 09, 2021
    • Stefan Sperling's avatar
      add CVS commit ID support to rlog.py · f5b974a0
      Stefan Sperling authored
      Newer CVS clients tag commits with a commit ID which allows us to
      correctly convert commits which changed several RCS files at once.
      The rsync access method based on cvs2gitdump was already taking
      advantage of this. To ensure that conversions over the pserver
      protocol yield the same result as conversions over rsync we need
      to add commit ID support to rlog.py.
      
      Add two new test cases which convert the same repository over
      rsync and pserver respectively, and ensure that they yield the
      same result. Without commit ID support conversion over pserver
      produces a different result for this particular test repository.
      
      With feedback about coding style from vlorentz.
      f5b974a0
    • Stefan Sperling's avatar
      handle Attic-only RCS files over CVS pserver · d28a4b21
      Stefan Sperling authored
      CVS repositories may contain RCS history in file,v as well as
      a corresponding Attic/file,v where each file contains separate
      events that occurred in history. The Attic version of the file
      results from file deletion events.
      
      The rsync access method already uses history found in the Attic.
      However, a CVS server will only return RCS files from the Attic
      if we request them explicitly. If we do not request them then our
      converted history may end up missing deletion events for some files.
      Unfortunately, we cannot tell which RCS files have a corresponding
      file in the Attic, so we need to search all Attic directories by
      running the equivalent of 'cvs rlog' in each directory. This slows
      down pserver access considerably (and it was already quite slow
      compared to rsync). But we need to pay this price in order to
      obtain a valid conversion result.
      
      This patch contains related fixes to cvsroot path handling, which
      was broken for the pserver case. Without these fixes we cannot
      create the correct paths for Attic directories to search.
      
      Problem found while comparing conversion results of rsync and
      pserver access methods for the GNU dino CVS repository at
      cvs.savannah.gnu.org/sources/dino
      Add two new test cases based on RCS files from this repository.
      
      Without this fix in place history would diverge at this commit:
        8891a63 | larsl | Removed the MIDIEvent class | 04 May 2006, 01:11 UTC
      Because the files midievent.cpp and midievent.hpp would not get deleted
      when converting this commit via the pserver protocol.
      d28a4b21
    • Stefan Sperling's avatar
      improve test coverage of file additions and deletions · d72f15f2
      Stefan Sperling authored
      Make an existing test case run over pserver as well.
      This access method uses a different way of detecting file
      additions and deletions and should be tested separately.
      
      Add new tests to cover the re-addition of a file after it
      was deleted.
      d72f15f2
    • Stefan Sperling's avatar
      ca23bc13
    • Stefan Sperling's avatar
      add support for RCS keyword expansion over pserver protocol · f52f0e45
      Stefan Sperling authored
      We can simply ask the CVS server to expand keywords for us, instead
      of forcing binary file mode with the -kb option. The CVS repository
      contains per-file keyword expansion defaults the server will use.
      Files checked out by cvsclient.py should now match what a regular
      CVS client would check out by default.
      
      Add test cases which verify that we create the same snapshot ID
      for a repository which uses the Id keyword in a file, regardless
      of whether this repository is accessed via rsync or pserver.
      f52f0e45
  15. Nov 05, 2021
  16. Nov 03, 2021
  17. Oct 27, 2021
    • Stefan Sperling's avatar
      test checkout of file lacking trailing \n over pserver protocol · beb7fc8a
      Stefan Sperling authored
      This test reproduces the bug fixed in
      commit d3b3344b where our custom cvs
      client would fail to check out a file which lacks a trailing newline
      from a remote CVS server.
      
      The error triggered by the test without the fix in place is:
      
      CVSProtocolError: Overlong response from CVS server:
      b'delta with no trailing eolok\n'
      beb7fc8a
    • Stefan Sperling's avatar
      rlog: fix loading of CVS commits which have a commit ID · 509ac801
      Stefan Sperling authored
      The CVS commit ID is an optional attribute which is only generated
      by relatively recent releases of CVS clients. Our rlog parser was
      skipping such commits because it failed to match on them due to an
      error in a regular expression.
      This resulted in an incomplete import of CVS revision history.
      
      Here is a sample line from cvs rlog output which carries a
      commit ID and was not matched because the regex lacked the
      trailing semicolon:
      date: 2007-07-17 15:02:50 +0200;  author: larsl;  state: Exp;  lines: +619 -285;  commitid: oju0x8tTc9aUB7qs;
      
      Found while testing ingestion of the GNU dino repository from
      cvs.sannah.gnu.org/sources/dino
      509ac801
    • Stefan Sperling's avatar
      rlog: fix parsing of multiple file revisions · 0829dc33
      Stefan Sperling authored
      The rlog parser was only fetching a single file revision because
      some lines of code had the wrong indentation. These lines were
      supposed to be part of a loop body but were only executed once.
      
      Also rename a function which had a misleading name and docstring.
      This function does in fact process the entire RCS revision history
      of a given file, as opposed to just one entry of RCS revision history.
      
      Found while testing ingestion of the GNU dino repository from
      cvs.savannah.gnu.org/sources/dino
      0829dc33
    • Stefan Sperling's avatar
    • Stefan Sperling's avatar
      cvsclient: handle additional responses sent by server · 3a2f06b3
      Stefan Sperling authored
      While checking out files the server sends messages to the CVS
      client which provide information about the state of file paths.
      
      Our custom CVS client implementation needs to recognize a few
      additional responses the server may send while checking out a
      different version of a file which was already checked earlier.
      Otherwise our client will error out. We can simply ignore the
      messages (and its 2 paths arguments separated by \n) because
      we do not manage an actual CVS working copy.
      
      Found while testing ingestion of the GNU dino repository at
      cvs.savannah.gnu.org/sources/dino
      3a2f06b3
    • Stefan Sperling's avatar
      cvsclient: handle files which lack a trailing newline · d3b3344b
      Stefan Sperling authored
      CVS uses \n as a protocol message separator, which forces us
      to read protocol message line-by-line. File content sent by
      the server has a length known and is transmitted in bytes.
      The server appends a final "ok\n" message (or perhaps an error
      message) when it is done sending file contents.
      
      Properly handle the case where this final message gets buffered
      along with file contents and is not delimited from file contents
      by \n because the file lacks a trailing newline. Previously, the
      final protocol message ended up being written out to file contents
      in this case.
      
      Found while testing ingestion of the GNU dino CVS repository from
      cvs.savannah.gnu.org/sources/dino.
      d3b3344b
  18. Oct 04, 2021
  19. Oct 01, 2021
Loading