Skip to content
Snippets Groups Projects
  1. Aug 27, 2024
    • Antoine Lambert's avatar
      crates: Speedup listing by processing crates in batch · 42e76ee6
      Antoine Lambert authored
      Instead of having a single crate and its versions info per page,
      prefer to have up to 1000 crates per page to significantly speedup
      the listing process.
      42e76ee6
    • Antoine Lambert's avatar
      crates: Record lister state only if all crates were processed · c6aa490f
      Antoine Lambert authored
      Previously, the lister state was recorded regardless if errors occurred
      when listing crates as the finalize method is called regardless of raised
      exception during listing.
      
      As a consequence some crates could be missed as the incremental listing
      restarts from the dump date of the last processed crate database.
      
      So ensure all crates have been processed by the lister before recording
      its state.
      c6aa490f
    • Antoine Lambert's avatar
      crates: Use looseversion.LooseVersion2 to parse crate versions · aafaebd5
      Antoine Lambert authored
      packaging.version.parse is dedicated to parse Python package version
      numbers but crate versions do not necessarily respect Python version
      number conventions and thus some crate versions cannot be parsed.
      
      Prefer to use looseversion.LooseVersion2 instead which in a drop-in
      replacement for deprecated distutils.version.LooseVersion and enables
      to parse all kind of version numbers.
      aafaebd5
    • Antoine Lambert's avatar
      crates: Bump csv field size limit · b2ece7ca
      Antoine Lambert authored
      A size limit of 1000000 was not enough to properly process
      all CSV crates data so bump to a higher value.
      b2ece7ca
  2. Jul 18, 2024
  3. Jun 28, 2024
  4. Jun 05, 2024
    • Antoine Lambert's avatar
      gitea, gogs: Ensure query parameters are not duplicated in API URLs · 323e2774
      Antoine Lambert authored
      Gitea API return next pagination link with all query parameters provided
      to an API request.
      
      As we were also passing a dict of fixed query parameters to the page_request
      method, some query parameters ended up having multiple instances in the URL
      for fetching a new page of repositories data. So each time a new page was
      requested, new instances of these parameters were appended to the URL which
      could result in a really long URL if the number of pages to retrieve is high
      and make the request fail.
      
      Also remove a debug log already present in http_request method.
      323e2774
  5. May 22, 2024
  6. Apr 24, 2024
  7. Apr 16, 2024
    • Antoine Lambert's avatar
      Use beautifulsoup4 CSS selectors to simplify code and type checking · 41407e0e
      Antoine Lambert authored
      As the types-beautifulsoup4 package gets installed in the swh virtualenv
      as it is a swh-scanner test dependency, some mypy errors were reported
      related to beautifulsoup4 typing.
      
      As the returned type for the find method of bs4 is the following union:
      Tag | NavigableString | None, isinstance calls must be used to ensure
      proper typing which is not great.
      
      So prefer to use the select_one method instead where a simple None check
      must be done to ensure typing is correct as it is returning Optional[Tag].
      In a similar manner, replace use of find_all method by select method.
      
      It also has the advantage to simplify the code.
      41407e0e
  8. Mar 29, 2024
  9. Mar 14, 2024
  10. Mar 13, 2024
  11. Feb 05, 2024
  12. Jan 18, 2024
  13. Jan 17, 2024
  14. Jan 10, 2024
  15. Jan 09, 2024
    • Franck Bret's avatar
      Elm stateful lister · 82ee0951
      Franck Bret authored
      Use another Api endpoint that helps the lister to be stateful.
      The Api endpoint used needs a ``since`` value that represents a
      sequential index in the history.
      The ``all_packages_count`` state helps in storing a count which will be
      used as ``since`` argument on the next run.
      82ee0951
    • Franck Bret's avatar
      Adapt and rebase · 4b1f49ac
      Franck Bret authored
      'url' and 'instance' are mandatory
      Add elm lister entry to pyproject.toml
      4b1f49ac
    • Franck Bret's avatar
      Elm Lister · 3a1beae3
      Franck Bret authored
      The Elm Lister lists Elm packages origins from the Elm
      lang registry.
      It uses an http api endpoint to list packages origins.
      Origins are Github repositories, releases take advantages
      of Github relase Api.
      3a1beae3
  16. Jan 08, 2024
  17. Dec 18, 2023
    • Franck Bret's avatar
      Stateful Julia lister · 99bbd9d6
      Franck Bret authored
      Add a state to the lister to store the ``last_seen_commit`` as a Git
      commit hash.
      
      Use Dulwich to retrieve a Git commit walker since
      ``last_seen_commit`` if any.
      For each commit detect if it is a new package or a new package
      version commit and returns its origin with commit date as
      last_update.
      99bbd9d6
  18. Dec 05, 2023
  19. Dec 03, 2023
  20. Dec 01, 2023
  21. Nov 29, 2023
  22. Nov 16, 2023
  23. Nov 15, 2023
  24. Nov 14, 2023
  25. Nov 07, 2023
  26. Oct 18, 2023
  27. Oct 12, 2023
  28. Oct 09, 2023
Loading