Skip to content
Snippets Groups Projects
  1. Feb 10, 2025
  2. Jan 22, 2025
  3. Dec 11, 2024
  4. Nov 07, 2024
  5. Oct 29, 2024
  6. Oct 28, 2024
  7. Oct 24, 2024
  8. Oct 14, 2024
  9. Sep 05, 2024
    • Antoine Lambert's avatar
      sourceforge: Also skip ConnectionError when fetching project info · 927aebbd
      Antoine Lambert authored
      The sourceforge lister sends various HTTP requests to get info about a
      project, for instance to get the branch name of a Bazaar project.
      
      If HTTP errors occurred during these steps, they were discarded in order
      for the listing to continue but connection errors were not and as a
      consequence the listing was failing when encountering such error.
      
      Currently, the legacy Bazaar project hosted on sourceforge seems down and
      connection  errors are raised when attempting to fetch branch names so the
      lister does not process all projects as it crashes in mid-flight.
      v6.8.0
      927aebbd
  10. Sep 04, 2024
    • Antoine Lambert's avatar
      Add save-bulk lister to check origins prior their insertion in database · af24960b
      Antoine Lambert authored
      This new and special lister enables to verify a list of origins to archive
      provided by users (for instance through the Web API).
      
      Its purpose is to avoid polluting the scheduler database with origins that
      cannot be loaded into the archive.
      
      Each origin is identified by an URL and a visit type. For a given visit type
      the lister is checking if the origin URL can be found and if the visit type
      is valid.
      
      The supported visit types are those for VCS (bzr, cvs, hg, git and svn) plus
      the one for loading a tarball content into the archive.
      
      Accepted origins are inserted or upserted in the scheduler database.
      
      Rejected origins are stored in the lister state.
      
      Related to #4709
      af24960b
  11. Sep 02, 2024
  12. Aug 27, 2024
  13. Jul 18, 2024
  14. Jun 28, 2024
  15. Jun 05, 2024
    • Antoine Lambert's avatar
      gitea, gogs: Ensure query parameters are not duplicated in API URLs · 323e2774
      Antoine Lambert authored
      Gitea API return next pagination link with all query parameters provided
      to an API request.
      
      As we were also passing a dict of fixed query parameters to the page_request
      method, some query parameters ended up having multiple instances in the URL
      for fetching a new page of repositories data. So each time a new page was
      requested, new instances of these parameters were appended to the URL which
      could result in a really long URL if the number of pages to retrieve is high
      and make the request fail.
      
      Also remove a debug log already present in http_request method.
      323e2774
  16. May 22, 2024
  17. Apr 24, 2024
  18. Apr 16, 2024
    • Antoine Lambert's avatar
      Use beautifulsoup4 CSS selectors to simplify code and type checking · 41407e0e
      Antoine Lambert authored
      As the types-beautifulsoup4 package gets installed in the swh virtualenv
      as it is a swh-scanner test dependency, some mypy errors were reported
      related to beautifulsoup4 typing.
      
      As the returned type for the find method of bs4 is the following union:
      Tag | NavigableString | None, isinstance calls must be used to ensure
      proper typing which is not great.
      
      So prefer to use the select_one method instead where a simple None check
      must be done to ensure typing is correct as it is returning Optional[Tag].
      In a similar manner, replace use of find_all method by select method.
      
      It also has the advantage to simplify the code.
      41407e0e
  19. Mar 29, 2024
  20. Mar 14, 2024
  21. Mar 13, 2024
  22. Feb 05, 2024
  23. Jan 18, 2024
  24. Jan 17, 2024
  25. Jan 10, 2024
  26. Jan 09, 2024
    • Franck Bret's avatar
      Elm stateful lister · 82ee0951
      Franck Bret authored
      Use another Api endpoint that helps the lister to be stateful.
      The Api endpoint used needs a ``since`` value that represents a
      sequential index in the history.
      The ``all_packages_count`` state helps in storing a count which will be
      used as ``since`` argument on the next run.
      82ee0951
    • Franck Bret's avatar
      Adapt and rebase · 4b1f49ac
      Franck Bret authored
      'url' and 'instance' are mandatory
      Add elm lister entry to pyproject.toml
      4b1f49ac
    • Franck Bret's avatar
      Elm Lister · 3a1beae3
      Franck Bret authored
      The Elm Lister lists Elm packages origins from the Elm
      lang registry.
      It uses an http api endpoint to list packages origins.
      Origins are Github repositories, releases take advantages
      of Github relase Api.
      3a1beae3
  27. Jan 08, 2024
Loading