Skip to content
Snippets Groups Projects
  1. Oct 17, 2022
    • Jenkins for Software Heritage's avatar
    • Antoine Lambert's avatar
      loader: Yield only modified objects in process_cvs_changesets · c23d4250
      Antoine Lambert authored
      Previously, after each revision replay all files and directories of the
      CVS repository being loaded were collected and sent to the storage.
      This is a real bottleneck in terms of loading performances as it delegates
      the filtering of new objects to archive to the storage filtering proxy.
      
      As we known exactly the set of paths that have been modified in a CVS
      revision, prefer to do that filtering on the loader side and only
      send modified objects to storage instead of the whole set of contents
      and directories from the reconstructed filesystem.
      
      This should greatly improve loading performance for large repositories
      but also reduce loader memory consumption.
      v0.5.0
      c23d4250
    • Antoine Lambert's avatar
      loader: Reconstruct repo filesystem incrementally at each revision · b976aa6a
      Antoine Lambert authored
      Instead of creating a from_disk.Directory instance after each replayed
      CVS revision by recursively scanning all directories of the repository,
      prefer to have a single one as class member kept synchronized with the
      recontructed filesystem after each revision replay.
      
      This should improve loader in terms of performance, especially when
      delaing with large repositories.
      b976aa6a
    • Antoine Lambert's avatar
      rlog: Skip rlog entry with missing header in RlogConv.parse_rlog · 734207ba
      Antoine Lambert authored
      CVS rlog for a given module sent by server is a concatenation of
      rlog entries. Each entry has a header containing the path to a
      RCS file plus other info.
      
      It exist cases where a rlog entry header is empty which makes the
      rlog parsing fail.
      
      So instead of stopping rlog parsing by raising an exception, prefer
      to skip that entry and process the next one.
      
      Closes T4629
      734207ba
  2. Oct 14, 2022
    • Antoine Lambert's avatar
      loader, cvsclient: Read files line by line to reduce memory consumption · cfe7507a
      Antoine Lambert authored
      Instead of using the readlines method on file objects that retrieve all
      lines of a file and store them in memory, prefer to read files line
      by line by using the lazy generator of lines from file objects.
      
      This significantly reduce loader memory consumption when processing
      a large rlog output stored in a file.
      cfe7507a
  3. Oct 13, 2022
    • Antoine Lambert's avatar
      loader: Raise NotFound for missing CVS module when using pserver or ssh · 965c3de4
      Antoine Lambert authored
      That case was handled when using rsync protocol but not when using pserver
      or ssh protocol.
      
      Closes T4631
      965c3de4
    • Antoine Lambert's avatar
      cvsclient: Handle error in fetch_rlog when path does not exist · 356dfa27
      Antoine Lambert authored
      When attempting to fetch the rlog for a path that does not exist in
      the repository, the CVS server will respond with the following lines:
      
      E cvs rlog: could not read RCS file for <path>
      ok
      
      That error case was not handled in fetch_rlog so ensure it returns None
      when encountering it.
      
      The issue was spotted when the loader attempts to fetch more rlog data from
      Attic directories. The paths of these Attic directories are computed from
      those of the files in the repositories but it exist cases where those
      directories do not exist.
      356dfa27
  4. Sep 19, 2022
  5. Sep 15, 2022
  6. Jul 11, 2022
  7. Jul 08, 2022
  8. Jul 07, 2022
  9. Jul 06, 2022
    • Antoine Lambert's avatar
      Fix loading of CVS repositories with non valid UTF-8 paths · d89f8d13
      Antoine Lambert authored
      Some CVS repositories have paths which are non valid UTF-8 (typically
      ISO-8859-1 ones) but the loader implementation assumed all paths can
      be safely encoded to UTF-8 and was raising UnicodeEncodeError when
      attempting to encode non UTF-8 paths.
      
      That commit modifies the way CVS paths are handled by the loader by
      using their raw bytes representation instead of their UTF-8 decoded
      string representation.
      
      Also rcsparse.rcsfile constructor has been modified to take bytes path
      as argument instead of an unicode one in order to be able to successfully
      open non UTF-8 paths.
      
      Such CVS repositories can now be successfully loaded, either using rsync
      or pserver protocol.
      
      Related to T3980
      d89f8d13
  10. Jun 17, 2022
  11. May 20, 2022
  12. May 11, 2022
  13. May 10, 2022
  14. May 09, 2022
  15. May 02, 2022
  16. Apr 29, 2022
  17. Apr 27, 2022
    • Antoine Lambert's avatar
      tasks: Simplify implementation and add tests for listed origins · 4fc65233
      Antoine Lambert authored
      Recent changes in swh-scheduler add new parameters to the celery tasks
      produced from swh.scheduler.model.ListedOrigin instances.
      
      So ensure to handle any new parameters by not hardcoding the expected
      ones in task signatures.
      
      Remove unsafe use of unnamed task parameters.
      
      Add new tests checking task parameters produced from ListedOrigin
      instances do no raise error when attempting to create a cvs loader.
      
      Related to T4187
      v0.2.2
      4fc65233
  18. Apr 26, 2022
  19. Apr 25, 2022
  20. Apr 22, 2022
  21. Apr 21, 2022
  22. Apr 14, 2022
  23. Apr 08, 2022
  24. Apr 06, 2022
  25. Mar 22, 2022
    • Antoine Lambert's avatar
      pytest: Exclude build directory for tests discovery · ecb447a3
      Antoine Lambert authored
      Due to test modules being copied in subdirectories of the
      build directory by setuptools, it makes pytest fail by raising
      ImportPathMismatchError exceptions when invoked from root
      directory of the module.
      
      So ignore the build folder to discover tests.
      ecb447a3
  26. Feb 18, 2022
Loading