Skip to content

loader: Check pack size of non archived github origin prior fetching it

GitHub API provides for each repository the pack file size in kibibytes corresponding to a full clone.

As metadata for a GitHub repository are fetched at the beginning of the loading process (currently only for origins discovered by the github lister), parse their raw JSON bytes and store the pack file size as a loader attribute. Then, before fetching the pack file for a github origin without any base snapshot in the archive, check the pack file size is not greater than the threshold defined by the loader. If it is the case, abort the loading in order to save some network bandwidth.

Related to #3652

Some code could be simplified (JSON parsing of metadata) once swh-loader-metadata#3 implemented.

Edited by Antoine Lambert

Merge request reports