Skip to content
Snippets Groups Projects
  1. May 26, 2023
    • Antoine Lambert's avatar
      README.md: Update content and fix issues · 0c771e07
      Antoine Lambert authored
      Add jenkins badge for master branch build status.
      
      Rephrase introduction sentence.
      
      Remove remainings from when the file was written in restructuredText.
      
      Add syntax highlighting to code blocks.
      v1.7.0
      0c771e07
    • Antoine Lambert's avatar
      svn_repo: Optimize export_temporary performances significantly · 82eef54f
      Antoine Lambert authored
      The export_temporary method of the SvnRepo class exports the content of
      a subversion repository at a given revision in a temporary directory.
      
      As we also export the externals that might be associated to some paths
      in the repository, we first need to get all the svn:externals property
      values in order to determine if there is recursive or relative externals
      and adjust some export parameters accordingly.
      
      While that operation is fast when the subversion repository is hosted
      locally, it is terribly slow when the repository is hosted on a remote
      server. Indeed a recursive propget operation on a remote server sends
      a lot of network requests which slows down quite a lot the process,
      especially with large repositories.
      
      To improve the performances, the previous implementation was doing a
      full checkout of the repository to local filesystem and gets svn:externals
      property values from it. Nevertheless, that process is time consuming for
      large repositories and it can consume a lot of disk space.
      
      In order to remove that bottleneck and improve overall performances for
      getting all properties values, introduce a C++ extension module for Python
      that implements a fast way to crawl all paths of a repository and their
      associated properties. Unlike "svn ls --depth infinity" or "svn propget -R"
      commands it performs only one SVN request over the network, hence saving
      time especially with large repositories.
      The code is freely inspired from the fast-svn-crawler project by Dmitry
      Pavlenko (https://sourceforge.net/projects/fastsvncrawler/).
      
      The obtained speedup is quite impressive, on a large remote repository
      listing all paths using "svn ls --depth infinity" or gettings all svn:externals
      property values using "svn propget -R" takes around one hour while it takes
      only a couple of minutes using the approach implemented in the C++ extension
      module. Using that approach also enables to save disk space as we no longer
      need to perform a full checkout of the repository.
      
      This change should greatly improve the performances when reloading a svn
      repository already visited by Software Heritage. Indeed, before the possible
      archiving of new commits issued since last visit, the loader checks that a
      repository has not been altered by calling the export_temporary method using
      the remote repository URL.
      82eef54f
  2. May 23, 2023
  3. May 03, 2023
    • Antoine Lambert's avatar
      utils: Fix parsing of external path between single quotes · 8509bb2a
      Antoine Lambert authored
      Official subversion documentation only mentions that paths containing
      spaces must be surrounded by double quotes but we can find some external
      definitions in the wild whose paths are surrounded by single quotes.
      Those are properly handled by the official subversion client so we must
      do the same when parsing externals.
      8509bb2a
  4. Apr 17, 2023
    • Antoine Lambert's avatar
      replay: Filter externals to copy in copyfrom operations · 644fbb1d
      Antoine Lambert authored
      When a directory is copied from another one in a previous revision,
      externals must be copied only if they have been defined in a revision
      greater or equal to the revision the directory is copied from.
      
      So store the revision number an external is defined and use it to filter
      externals when performing copyfrom operations.
      644fbb1d
  5. Apr 04, 2023
  6. Mar 02, 2023
  7. Mar 01, 2023
  8. Feb 17, 2023
  9. Feb 16, 2023
  10. Feb 02, 2023
  11. Jan 20, 2023
  12. Jan 18, 2023
  13. Jan 17, 2023
  14. Dec 19, 2022
  15. Dec 08, 2022
  16. Dec 07, 2022
    • Antoine Lambert's avatar
      replay: Copy dir states and external paths in copy_from operations · b016f654
      Antoine Lambert authored
      Copied directories might have externals so we also need to copy states
      and update external paths in case externals list is later modified.
      b016f654
    • Antoine Lambert's avatar
      svn: Use urllib.parse.quote to percent encode svn URLs · fc78f574
      Antoine Lambert authored
      In order to detect all ascii characters that must be percent encoded
      in svn URLs, add a brute force test and use urllib.parse.quote in
      quote_svn_url function.
      fc78f574
    • Antoine Lambert's avatar
      utils: Raise ValueError when external definition could not be parsed · f377e9f7
      Antoine Lambert authored
      Such case can happen when an external definition is malformed.
      
      Previously, the parsed malformed external was added to the directory state
      with an empty external URL which could lead to unexpected side effects like
      removing all previously exported valid externals.
      f377e9f7
    • Antoine Lambert's avatar
      replay: Simplify FileEditor implementation · 301b31e9
      Antoine Lambert authored
      Instead of maintaining file state based on svn properties across revisions
      replay and trying to reconstruct the same file as with a svn export operation
      after applying text deltas, prefer to simply export the file from the currently
      processed revision when closing the associated file editor.
      
      This greatly simplify the replay module implementation while approximatively
      keeping the same performance as before.
      
      Also add a test that would fail without these changes.
      
      Related to T4673
      301b31e9
  17. Dec 05, 2022
  18. Nov 25, 2022
    • Antoine Lambert's avatar
      replay: Add more debug logs · e35f800a
      Antoine Lambert authored
      Add more debug logs to the replay module to ease detection of issues.
      Nevertheless, as those are quite verbose, only display them when setting
      debug parameter of the loader to True.
      e35f800a
  19. Nov 23, 2022
  20. Nov 22, 2022
    • Antoine Lambert's avatar
      loader: Add logs displaying path differences after revision replay · a843858b
      Antoine Lambert authored
      When a tree computation divergence is detected after replaying a revision
      add debug logs displaying the paths that differ or are missing between the
      reconstructed repository filesystem and the exported one at that specific
      revision.
      
      It should help to gain some time when debugging such issues.
      a843858b
  21. Oct 31, 2022
    • Antoine Lambert's avatar
      replay: Ensure copyfrom operations are properly handled · 04566a7f
      Antoine Lambert authored
      A subversion revision can contain new directories and files copied from
      ancestor revisions but those were not perfectly handled in the commit
      editor used to reconstruct the repository filesystem when replaying
      revisions.
      
      In particular previous implementation could not handle the case where a
      path copied from an ancestor revision is replaced in a same commit (for
      instance replacing a directory by a file with the same name).
      
      These changes ensure that info about source path and source revision from
      which a path is copied is passed to the commit editor methods as paramaters
      in order to let them handle the copies but also that the replace operations
      will be correctly replayed.
      
      It also prevents OS error "Too many open files" when a really large files
      tree is copied from an ancestor revision.
      v1.4.0
      04566a7f
    • Antoine Lambert's avatar
      loader: Compress dump file and rework truncated dump handling · d24ba1a5
      Antoine Lambert authored
      When dumping a subversion repository to file before loading it, compress
      that file using gzip while producing it. It enables to save significant
      disk space while dumping a large repository.
      
      Also rework the way how truncated dump is handled now dump file is
      compressed by providing the expected max revision number to be loaded
      by svnadmin. If the number of loaded revisions matches, we can safely
      continue the partial loading of the repository.
      d24ba1a5
  22. Oct 28, 2022
Loading