Skip to content
Snippets Groups Projects
  1. Feb 17, 2025
  2. Sep 09, 2024
  3. Aug 30, 2024
  4. Aug 27, 2024
  5. May 15, 2024
    • Pierre-Yves David's avatar
      concurrent-queries: enable the feature by default · 93914d20
      Pierre-Yves David authored
      This seems a useful feature.
      93914d20
    • Pierre-Yves David's avatar
      concurrent-queries: add the option to issue some request concurrently · 60abb446
      Pierre-Yves David authored
      Some large call might be sliced into multiple http requests. Since we
      known in advance we will send all these request, issuing then in
      parallel will run faster.
      
      The new rate limiting machinery will ensure that we do not issue
      requests faster than the server intend us to do.
      
      The "known" method is using such chunking and use this new approach when
      the parameter is set.
      
      The default concurrency of 20 have been picked arbitrarily, it seem
      large enough to provide a significant speedup and small enough to not
      hammer the server too much.
      60abb446
  6. Apr 18, 2024
    • Pierre-Yves David's avatar
      rate-limit: grant an initial portion of "free token" · 07c872e0
      Pierre-Yves David authored
      To avoid slowing down client that only needs to do a few requests, we
      grant them a small percentage of free initial request and rate limit on
      the remaining.
      
      See inline documentation for details.
      07c872e0
    • Pierre-Yves David's avatar
      client: more accurate and thread safe rate limiting · 6d78a444
      Pierre-Yves David authored
      This commit rework the rate limiting logic to use a more accurate and
      thread compatible method.
      
      In short, we know have a daemon thread, that process rate limit
      information from completed request and slowly issue "available-request"
      token to a `threading.Semaphore` instance at the appropriate rate.
      These "available-request" token are consumed by request, effectively
      reducing the rate of requests.
      
      See inline documentation for more details.
      
      Note that new rate limiting implementation is no longer "progressive".
      The previous implementation adds no delay between request initially and
      gradually increasing the delay between requests as the rate limit budget
      gets low. However this implementation had multiple issues:
      
      - The rate limit was enforced through explicit delay before each
        requests, slowing down operation regardless of the actual time between
        request.
      
      - The approach was oblivious of threading, so having multiple thread
        issue requests in parallel would increase the pace as much. To counter
        balance this, the "progressive" curse of the delay was "heavy handed"
        adding exponentially more delay that necessary to counter balance the
        threading effect.
      
      The new approach behave much better on both regards:
      
      - Requests are not delayed unless there is not "available-request"
        token. If some token have been accumulated, a request can proceed
        without any delay. If no token are available, the
        `Semaphore.acquire()` call will simply block until a "token" is
        generated. Since token are generated in the background at stable rate.
        Any time spend in other code between each requests will not combine
        with the rate limit delay.
      
      - The logic is fully compatible with thread. Tokens are produced at a
        stable pace regardless the number of Thread using a WebApiClient. In
        the same ways, each token can only be consumed by a single thread, so
        request will be issued at the intended thread regardless the number of
        thread trying to issue them.
      
      Regardless of the advantage of the new method, the lost ability to
      initially issue request at a faster rate is still something useful. We
      will re-introduce a solution for that in the next commit.
      6d78a444
  7. Mar 29, 2024
  8. Feb 14, 2024
  9. Feb 05, 2024
  10. Jan 15, 2024
  11. Jan 12, 2024
  12. Jan 09, 2024
    • Pierre-Yves David's avatar
      WebAPIClient: add some debug logging · 0ea158fe
      Pierre-Yves David authored
      We add some basic debug output to help monitor the HTTP exchange, rate
      limit information and associated request delay.
      
      This comes from the SWH scanner Client too.
      0ea158fe
    • Pierre-Yves David's avatar
      WebAPIClient: slow down request according to rate limit · 182a2bb9
      Pierre-Yves David authored
      This is mostly a port of the basic logic that exists in the SWH scanner
      Client.
      
      We can improve and adjust that logic in the future, but this is out of scope for
      this series.
      182a2bb9
    • Pierre-Yves David's avatar
      WebAPIClient: gather rate limit information · 5770837f
      Pierre-Yves David authored
      We gather rate limiting information from response header and we keep the
      most useful one. We do not do anything with it yet, but this will come
      soon.
      5770837f
    • Pierre-Yves David's avatar
      WebAPIClient: create a session to optimize request · 5dca0b14
      Pierre-Yves David authored
      The `requests` packages offer a simple way to reuse connection over
      multiple http request, so lets us using it for free.
      5dca0b14
    • Pierre-Yves David's avatar
      WebAPIClient: retry on failed request · ba8a9190
      Pierre-Yves David authored
      Detect bad 429 reply and retry on them. This is useful to avoid aborting
      in the middle of a large series of request.
      
      This is also a behavior imported from the SWH scanner version of the
      Client.
      ba8a9190
    • Pierre-Yves David's avatar
      known: comply with maximum request size · 957a00b1
      Pierre-Yves David authored
      The maximum number of swhids included in a single `known` API call is
      limited. So we introduce a way to automatically slices larger call in
      small request.
      
      We also make sure the constant is publicly available to help client to
      adjust their strategy.
      
      Such automatic slicing was first introduced in the SWH Scanner version
      of the Web API Client. It is both useful and required for feature
      parity.
      
      We take this as an opportunity to automatize some part of the test for
      the `known` query to do larger queries based on a common set of
      generated ids.
      957a00b1
    • Pierre-Yves David's avatar
      WebAPIClient: add a `get_origin` method · a9b96fc3
      Pierre-Yves David authored
      This method is weird and will probably not survive long as is. However
      that method is copied from the SWH scanner version of the web client.
      Since having two version for the web api client seems silly, I am adding
      the missing piece (whatever value these pieces have) to the more generic
      version.
      
      With this addition the scanner is now ready to switch to the
      `swh.web.client` version of the web client.
      
      Further work is needed to add parallel requests and rate limiting
      complience to this code. However, this work will not affect the public
      API of the object.
      a9b96fc3
    • Pierre-Yves David's avatar
      typing: set known input at Iterable · 8279a8df
      Pierre-Yves David authored
      The previous value was Iterator, which is much more restrictive and
      prevent passing a list as argument. Since list are useful, we update the
      function signature.
      8279a8df
  13. Dec 06, 2023
  14. Dec 04, 2023
  15. Dec 03, 2023
  16. Nov 29, 2023
  17. Nov 24, 2023
  18. Jun 07, 2023
Loading