Typically a SCN request targeting a large enough git repo that the max packfile size is hit should report the reason for the failure back to the user.
We should also list and categorize all the reasons that could end with a failed SCN request and make sure a comprehensive error is reported to the SCN user.
One possible way to get more details about why a SCN request has failed could be to query that info using the Sentry REST API. I managed to craft a request for it from the details of a SCN request that failed in order to extract the error info.
$ curl "https://sentry.softwareheritage.org/api/0/organizations/swh/events/?field=title&field=timestamp&query=(swh.loader.origin_url:https://github.com/hagezi/dns-blocklists)&start=2024-08-29T11:13:05.343297&end=2024-08-29T11:43:05.343297"-H"Authorization: Bearer $SENTRY_TOKEN" | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 547 100 547 0 0 2680 0 --:--:-- --:--:-- --:--:-- 2694{"data": [{"title": "OSError: Pack file too big for repository https://github.com/hagezi/dns-blocklists, limit is 4294967296 by...","timestamp": "2024-08-29T11:18:07+00:00","id": "b5b2ff6fad7d427290084c7fc4361059","project.name": "swh-loader-git"}],"meta": {"fields": {"title": "string","timestamp": "date","id": "string","project.name": "string"},"units": {"title": null,"timestamp": null,"id": null,"project.name": null},"isMetricsData": false,"isMetricsExtractedData": false,"tips": {"query": null,"columns": null},"datasetReason": "unchanged","dataset": "discover"}}
I suppose both approach could work, but I think it's kind of simpler (in terms of general architecture) not to depend on sentry for ui related features.