save_bulk: Add new endpoints to manage bulk archival of origins
- Jul 04, 2024
-
-
Antoine Lambert authored
Add a new django application to manage bulk archival of origins. As a first step, add a new Web API endpoint to submit a list of origins to archive and a new endpoint enabling to retrieve that list of origins in a paginated way. The Web API endpoint enables an user with specific permission to submit a list of origin URLs and their visit types through a POST request. The endpoint performs some basic checks on the received origins data to verify origin URLs are well formed but also if provided visit types are supported. If provided origins data are valid, the request is accepted and a oneshot scheduler task is created to execute the bulk-save lister. That lister will consume that list of origins, perform some extra but more costly checks on them and those validated by it will then be scheduled for loading into the archive. If some origins data are not valid, the request is rejected and a list of bogus origins along the reasons of their rejections is returned to the user in the Web API response. The other endpoint is dedicated to be consumed by the bulk-save lister to retrieve the submitted origins list in a paginated way. It is better to store submitted origins data on the webapp side to avoid bloating the scheduler database with large JSON documents in case a big list of origins is submitted by an user. Related to #4802.
-
Antoine Lambert authored
-
Antoine Lambert authored
-