Skip to content
Snippets Groups Projects
  1. Oct 17, 2022
    • Antoine Lambert's avatar
      rlog: Skip rlog entry with missing header in RlogConv.parse_rlog · 734207ba
      Antoine Lambert authored
      CVS rlog for a given module sent by server is a concatenation of
      rlog entries. Each entry has a header containing the path to a
      RCS file plus other info.
      
      It exist cases where a rlog entry header is empty which makes the
      rlog parsing fail.
      
      So instead of stopping rlog parsing by raising an exception, prefer
      to skip that entry and process the next one.
      
      Closes T4629
      734207ba
  2. Oct 14, 2022
    • Antoine Lambert's avatar
      loader, cvsclient: Read files line by line to reduce memory consumption · cfe7507a
      Antoine Lambert authored
      Instead of using the readlines method on file objects that retrieve all
      lines of a file and store them in memory, prefer to read files line
      by line by using the lazy generator of lines from file objects.
      
      This significantly reduce loader memory consumption when processing
      a large rlog output stored in a file.
      cfe7507a
  3. Oct 13, 2022
    • Antoine Lambert's avatar
      loader: Raise NotFound for missing CVS module when using pserver or ssh · 965c3de4
      Antoine Lambert authored
      That case was handled when using rsync protocol but not when using pserver
      or ssh protocol.
      
      Closes T4631
      965c3de4
    • Antoine Lambert's avatar
      cvsclient: Handle error in fetch_rlog when path does not exist · 356dfa27
      Antoine Lambert authored
      When attempting to fetch the rlog for a path that does not exist in
      the repository, the CVS server will respond with the following lines:
      
      E cvs rlog: could not read RCS file for <path>
      ok
      
      That error case was not handled in fetch_rlog so ensure it returns None
      when encountering it.
      
      The issue was spotted when the loader attempts to fetch more rlog data from
      Attic directories. The paths of these Attic directories are computed from
      those of the files in the repositories but it exist cases where those
      directories do not exist.
      356dfa27
  4. Sep 15, 2022
  5. Jul 11, 2022
  6. Jul 08, 2022
  7. Jul 07, 2022
  8. Jul 06, 2022
    • Antoine Lambert's avatar
      Fix loading of CVS repositories with non valid UTF-8 paths · d89f8d13
      Antoine Lambert authored
      Some CVS repositories have paths which are non valid UTF-8 (typically
      ISO-8859-1 ones) but the loader implementation assumed all paths can
      be safely encoded to UTF-8 and was raising UnicodeEncodeError when
      attempting to encode non UTF-8 paths.
      
      That commit modifies the way CVS paths are handled by the loader by
      using their raw bytes representation instead of their UTF-8 decoded
      string representation.
      
      Also rcsparse.rcsfile constructor has been modified to take bytes path
      as argument instead of an unicode one in order to be able to successfully
      open non UTF-8 paths.
      
      Such CVS repositories can now be successfully loaded, either using rsync
      or pserver protocol.
      
      Related to T3980
      d89f8d13
  9. Jun 17, 2022
  10. May 20, 2022
  11. May 10, 2022
  12. May 09, 2022
  13. May 02, 2022
  14. Apr 27, 2022
    • Antoine Lambert's avatar
      tasks: Simplify implementation and add tests for listed origins · 4fc65233
      Antoine Lambert authored
      Recent changes in swh-scheduler add new parameters to the celery tasks
      produced from swh.scheduler.model.ListedOrigin instances.
      
      So ensure to handle any new parameters by not hardcoding the expected
      ones in task signatures.
      
      Remove unsafe use of unnamed task parameters.
      
      Add new tests checking task parameters produced from ListedOrigin
      instances do no raise error when attempting to create a cvs loader.
      
      Related to T4187
      v0.2.2
      4fc65233
  15. Apr 26, 2022
  16. Apr 25, 2022
  17. Apr 22, 2022
  18. Apr 21, 2022
  19. Apr 14, 2022
    • Antoine Lambert's avatar
      loader: Fix rsync failures by retrying associated commands · 9f5456b6
      Antoine Lambert authored
      Fetching CVS repository data using rsync often fails with the following error
      (especially with archived repositories hosted on sourceforge):
      
      rsync error: some files/attrs were not transferred (see previous errors) (code 23)
      
      It seems the only way to mitigate that issue is to retry the rsync command
      until it succeeds.
      
      So add a rsync_retry decorator and apply it to a new method in the loader
      wrapping the call to subprocess.run executing the rsync command.
      
      Also use rsync option to compress file data during the transfer.
      v0.1.1
      9f5456b6
  20. Apr 08, 2022
  21. Apr 06, 2022
  22. Mar 22, 2022
    • Antoine Lambert's avatar
      pytest: Exclude build directory for tests discovery · ecb447a3
      Antoine Lambert authored
      Due to test modules being copied in subdirectories of the
      build directory by setuptools, it makes pytest fail by raising
      ImportPathMismatchError exceptions when invoked from root
      directory of the module.
      
      So ignore the build folder to discover tests.
      ecb447a3
  23. Feb 18, 2022
  24. Feb 10, 2022
  25. Feb 07, 2022
  26. Jan 07, 2022
  27. Jan 06, 2022
    • Stefan Sperling's avatar
      validate input paths in the CVS loader · 238c9c03
      Stefan Sperling authored
      The CVS loader creates files on the local file system based on
      paths which were read from a local copy of a CVS repository or
      sent by a CVS server as part of its "cvs rlog" response.
      
      Ensure that such paths will not be able to escape the temporary
      directory which stores checked out versions of files.
      v0.1.0
      238c9c03
  28. Dec 16, 2021
  29. Dec 13, 2021
  30. Dec 09, 2021
    • Stefan Sperling's avatar
      fix Log keyword expansion with trailing whitespace in prefix · a66c6b49
      Stefan Sperling authored
      Our expansion of the Log keyword was slightly wrong. We need to
      trim trailing whitespace from the "prefix" line content which
      preceeds the Log keyword when we write out line content which
      followed the Log keyword. Update the Log expansion example given
      in a comment to document this (see there for details; this behaviour
      of CVS is hard to explain without illustration).
      
      Found while testing conversion of the OpenBSD CVS repository.
      Add a new test which uses an RCS file from this repository to
      reproduce this problem.
      a66c6b49
    • Stefan Sperling's avatar
      support custom keywords during rsync:// conversion · dcb895ca
      Stefan Sperling authored
      CVS supports the definition of custom keywords. A common use case
      for custom keywords is to use the project name as a keyword. This
      avoids confusion when files are copied between projects using CVS,
      in case files contain a keyword that is in use by both projects.
      In other words, a file will retain its expanded custom keyword from
      project A, allowing to trace the initial file version back to its
      origin, after the file was copied into project B's CVS repository.
      
      This feature is in active use by OpenBSD and NetBSD, for example.
      Existing conversions of their CVS repositories to Git expand
      the corresponding custom keywords as well, and so should we.
      Historically, X11 and FreeBSD were also using custom keywords.
      
      During conversion via rsync:// we copy the CVSROOT directory and the
      desired CVS module from the rsync server. The file CVSROOT/config
      contains directives which configure the use of custom keywords.
      Parse this file and expand keywords accordingly when checking out
      versions of files from our local copy of the CVS repository.
      
      For now, we only support custom keywords which correspond to the
      Id keyword since this is known to be in common use by projects.
      The latest releases of CVS (1.12.x) have optional support for arbitrary
      keyword aliases via custom keywords. Support for this could be added
      later, should there be a need to do so. In any case, the pserver access
      method already supports arbitrary custom keywords because such keywords
      will be expanded by the CVS server when we check out files from it.
      
      While here, optimize our use of rsync a bit.
      Fetch only CVSROOT and the desired CVS module over rsync, rather
      than fetching the entire CVS repository directory, which may contain
      unrelated CVS modules that require disk space but will not be used.
      dcb895ca
Loading