Skip to content
Snippets Groups Projects
  1. Feb 20, 2024
  2. Feb 05, 2024
  3. Jan 16, 2024
  4. Jan 03, 2024
  5. Dec 21, 2023
  6. Dec 19, 2023
  7. Dec 05, 2023
  8. Dec 03, 2023
  9. Nov 28, 2023
  10. Nov 22, 2023
  11. Nov 20, 2023
  12. Nov 16, 2023
  13. Nov 15, 2023
  14. Oct 26, 2023
  15. Oct 19, 2023
  16. Jun 09, 2023
  17. Jun 05, 2023
  18. Jun 01, 2023
  19. May 31, 2023
    • Antoine Lambert's avatar
      svn_repo: Handle anonymous credentials in get_svn_repo · 2491efd9
      Antoine Lambert authored
      Some remote subversion repositories in the wild require to use anonymous
      credentials to obtain a read access. Those credentials can be either
      'anonymous/' or 'anonymous/anonymous' in most of the cases.
      
      Previously, it was required to add the credentials in the origin URL using
      basic authentication syntax. While such URL can be provided to the Save Code
      Now service, it cannot when svn URLs are coming from a lister.
      
      To workaround this, try to connect using these anonymous credentials in the
      get_svn_repo function when a connection error happens.
      
      This should also simplify the submission of Save Code Now requests when
      submitting a subversion origin that requires anonymous credentials.
      2491efd9
    • Antoine Lambert's avatar
      svn_repo: Optimize export of a remote subversion sub-path · 2a073bc8
      Antoine Lambert authored
      Previously when exporting a sub-path of a remote subversion repository
      over the network, the full repository was exported and the local path
      targeting the sub-path was returned. This is no really optimal in terms
      of network bandwidth if the repository filesystem is large but it was
      implemented like this to ensure all tests related to sub-paths export
      were passing regardless the subversion loader class used: either SvnLoader
      or SvnLoaderFromRemoteDump.
      
      After some analysis, it turned out that it was possible to avoid exporting
      the full repository but only the request sub-path when using the SvnLoader
      class. So modify the SvnRepo class to ensure that behavior and save some
      network bandwdith when dealing with a large repository.
      
      These changes in the SvnRepo class induce some in the replay module to ensure
      all tests still pass and it also enables to remove a no longer needed optional
      parameter to the class constructor.
      2a073bc8
  20. May 30, 2023
  21. May 26, 2023
    • Antoine Lambert's avatar
      README.md: Update content and fix issues · 0c771e07
      Antoine Lambert authored
      Add jenkins badge for master branch build status.
      
      Rephrase introduction sentence.
      
      Remove remainings from when the file was written in restructuredText.
      
      Add syntax highlighting to code blocks.
    • Antoine Lambert's avatar
      svn_repo: Optimize export_temporary performances significantly · 82eef54f
      Antoine Lambert authored
      The export_temporary method of the SvnRepo class exports the content of
      a subversion repository at a given revision in a temporary directory.
      
      As we also export the externals that might be associated to some paths
      in the repository, we first need to get all the svn:externals property
      values in order to determine if there is recursive or relative externals
      and adjust some export parameters accordingly.
      
      While that operation is fast when the subversion repository is hosted
      locally, it is terribly slow when the repository is hosted on a remote
      server. Indeed a recursive propget operation on a remote server sends
      a lot of network requests which slows down quite a lot the process,
      especially with large repositories.
      
      To improve the performances, the previous implementation was doing a
      full checkout of the repository to local filesystem and gets svn:externals
      property values from it. Nevertheless, that process is time consuming for
      large repositories and it can consume a lot of disk space.
      
      In order to remove that bottleneck and improve overall performances for
      getting all properties values, introduce a C++ extension module for Python
      that implements a fast way to crawl all paths of a repository and their
      associated properties. Unlike "svn ls --depth infinity" or "svn propget -R"
      commands it performs only one SVN request over the network, hence saving
      time especially with large repositories.
      The code is freely inspired from the fast-svn-crawler project by Dmitry
      Pavlenko (https://sourceforge.net/projects/fastsvncrawler/).
      
      The obtained speedup is quite impressive, on a large remote repository
      listing all paths using "svn ls --depth infinity" or gettings all svn:externals
      property values using "svn propget -R" takes around one hour while it takes
      only a couple of minutes using the approach implemented in the C++ extension
      module. Using that approach also enables to save disk space as we no longer
      need to perform a full checkout of the repository.
      
      This change should greatly improve the performances when reloading a svn
      repository already visited by Software Heritage. Indeed, before the possible
      archiving of new commits issued since last visit, the loader checks that a
      repository has not been altered by calling the export_temporary method using
      the remote repository URL.
      82eef54f
  22. May 23, 2023
  23. May 03, 2023
    • Antoine Lambert's avatar
      utils: Fix parsing of external path between single quotes · 8509bb2a
      Antoine Lambert authored
      Official subversion documentation only mentions that paths containing
      spaces must be surrounded by double quotes but we can find some external
      definitions in the wild whose paths are surrounded by single quotes.
      Those are properly handled by the official subversion client so we must
      do the same when parsing externals.
      8509bb2a
  24. Apr 17, 2023
    • Antoine Lambert's avatar
      replay: Filter externals to copy in copyfrom operations · 644fbb1d
      Antoine Lambert authored
      When a directory is copied from another one in a previous revision,
      externals must be copied only if they have been defined in a revision
      greater or equal to the revision the directory is copied from.
      
      So store the revision number an external is defined and use it to filter
      externals when performing copyfrom operations.
      644fbb1d
  25. Apr 04, 2023
  26. Mar 02, 2023
  27. Mar 01, 2023
    • Antoine Lambert's avatar
      utils/init_svn_repo_from_dump: Improve svnadmin load performance · 15280d5d
      Antoine Lambert authored
      "svnadmin load" has a --no-flush-to-disk option enabling faster load
      while being unsafe on power off. This drawback is not an issue for the
      subversion loader so use that option to significantly improve the
      performance for loading a repository from a dump file into a directory
      on the local filesystem.
      15280d5d
Loading