Skip to content

save_bulk: Add new endpoints to manage bulk archival of origins

Add a new django application to manage bulk archival of origins.

As a first step, add a new Web API endpoint to submit a list of origins to archive and a new endpoint enabling to retrieve that list of origins in a paginated way.

The Web API endpoint enables an user with specific permission to submit a list of origin URLs and their visit types through a POST request. The endpoint performs some basic checks on the received origins data, insert them in the webapp database if they are valid and creates a oneshot scheduler task to execute the bulk-save lister. That lister will consume that list of origins and those validated by it will then be scheduled for loading into the archive.

The other endpoint is dedicated to be consumed by the bulk-save lister to retrieve the submitted origins list in a paginated way. It is better to store submitted origins data on the webapp side to avoid bloating the scheduler database with large JSON documents in case a big list of origins is submitted by an user.

Related to #4802.

Merge request reports