Skip to content
Snippets Groups Projects

save_bulk: Add new endpoints to manage bulk archival of origins

Merged Antoine Lambert requested to merge anlambert/swh-web:bulk-save-api-endpoint-post into master
  1. Jul 04, 2024
    • Antoine Lambert's avatar
      save_bulk: Add new endpoints to manage bulk archival of origins · 8ac56399
      Antoine Lambert authored
      Add a new django application to manage bulk archival of origins.
      
      As a first step, add a new Web API endpoint to submit a list of origins
      to archive and a new endpoint enabling to retrieve that list of origins
      in a paginated way.
      
      The Web API endpoint enables an user with specific permission to submit
      a list of origin URLs and their visit types through a POST request.
      The endpoint performs some basic checks on the received origins data to
      verify origin URLs are well formed but also if provided visit types are
      supported.
      
      If provided origins data are valid, the request is accepted and a oneshot
      scheduler task is created to execute the bulk-save lister. That lister will
      consume that list of origins, perform some extra but more costly checks on
      them and those validated by it will then be scheduled for loading into the
      archive.
      
      If some origins data are not valid, the request is rejected and a list of
      bogus origins along the reasons of their rejections is returned to the user
      in the Web API response.
      
      The other endpoint is dedicated to be consumed by the bulk-save lister to
      retrieve the submitted origins list in a paginated way. It is better to
      store submitted origins data on the webapp side to avoid bloating the
      scheduler database with large JSON documents in case a big list of origins
      is submitted by an user.
      
      Related to #4802.
      8ac56399
    • Antoine Lambert's avatar
    • Antoine Lambert's avatar
      48c6fd33
Loading