Rework deposit checker implementation
Loading
This MR contains two commits:
api/private: Add endpoint to get download links of uploaded tarballs
Add a new private Web API endpoint to get a list of URLs for downloading
the tarballs uploaded with a deposit.
In development mode, the tarballs are stored in the local filesystem
and served by django.
In production mode, the tarballs are stored in an azure blob storage
and temporary download links with a shared access signature are
generated when requesting the endpoint.
It enables to move costly operations related to downloading and
processing tarballs in celery workers instead of letting the deposit
server performing those tasks.
checker: Remove private API endpoint and do checks on celery worker
Checking deposit archives can be a costly operation as the checker
must download the archives to list their content.
It has been observed in production that if a large archive has been
uploaded with a deposit, requesting the check endpoint of the private
deposit API can end up with gunicorn worker being killed as the
time to download the archive exceeds the worker timeout.
So instead of using the private API endpoint performs the checks,
prefer to move these operations in the celery worker executing the
check-deposit task.
Related to #4657 (closed).
Fixes #4658 (closed).
These changes have been plugged into :