Skip to content
Snippets Groups Projects

Rework deposit checker implementation

Merged Antoine Lambert requested to merge anlambert/swh-deposit:deposit-upload-urls into master

This MR contains two commits:

api/private: Add endpoint to get download links of uploaded tarballs
    
Add a new private Web API endpoint to get a list of URLs for downloading
the tarballs uploaded with a deposit.

In development mode, the tarballs are stored in the local filesystem
and served by django.

In production mode, the tarballs are stored in an azure blob storage
and temporary download links with a shared access signature are
generated when requesting the endpoint.

It enables to move costly operations related to downloading and
processing tarballs in celery workers instead of letting the deposit
server performing those tasks.
checker: Remove private API endpoint and do checks on celery worker
    
Checking deposit archives can be a costly operation as the checker
must download the archives to list their content.

It has been observed in production that if a large archive has been
uploaded with a deposit, requesting the check endpoint of the private
deposit API can end up with gunicorn worker being killed as the
time to download the archive exceeds the worker timeout.

So instead of using the private API endpoint performs the checks,
prefer to move these operations in the celery worker executing the
check-deposit task.

Related to #4657 (closed).

Fixes #4658 (closed).

These changes have been plugged into :

Edited by Antoine Lambert

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Bravo for incorporating azurite into the tests! Made a small suggestion on routing & statics in a dev context.

  • added 1 commit

    • 03dba0d8 - api/private: Add endpoint to get download links of uploaded tarballs

    Compare with previous version

  • Author Maintainer

    ^ Enable view serving uploaded tarballs only when azure storage backend is not used.

  • Antoine Lambert resolved all threads

    resolved all threads

  • Jenkins job DDEP/gitlab-builds #231 succeeded in 2 min 47 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

  • mentioned in merge request swh-loader-core!542 (merged)

  • Antoine Lambert added 2 commits

    added 2 commits

    • b8fe2c91 - api/private: Add endpoint to get download links of uploaded tarballs
    • 81b73c6e - checker: Remove private API endpoint and do checks on celery worker

    Compare with previous version

  • Antoine Lambert added 2 commits

    added 2 commits

    • 9c5de159 - api/private: Add endpoint to get download links of uploaded tarballs
    • 6be468f4 - checker: Remove private API endpoint and do checks on celery worker

    Compare with previous version

  • Jenkins job DDEP/gitlab-builds #232 failed in 2 min 52 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

  • Antoine Lambert changed title from api/private: Add endpoint to get download links of uploaded tarballs to Rework deposit checker implementation

    changed title from api/private: Add endpoint to get download links of uploaded tarballs to Rework deposit checker implementation

  • Antoine Lambert changed the description

    changed the description

  • Jenkins job DDEP/gitlab-builds #233 failed in 2 min 45 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

  • added 1 commit

    • 41f079a0 - checker: Remove private API endpoint and do checks on celery worker

    Compare with previous version

  • Jenkins job DDEP/gitlab-builds #235 failed in 2 min 47 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading