Consider dropping pull request references from the git loader ingestion
The loader git currently filters out references considered not that interesting [1]:
- auto-merged github pull requests (reference names starting with
refs/pulls
and finishes with/merge
) - peeled refs (reference names finishing with
^{}
)
Nonetheless, the current loader git actually still load pull requests references and there can be a lot depending on the repository. See for example a recent snapshot on the torvalds/linux [2] repository.
We should consider whether that's still relevant to ingest those references.
The webapp already considers this noise and has been filtering them out from the browsing since v0.0.288 version [3]. So that tends toward ignoring them as well during the ingestion.
That should also alleviate other current considerations [4].
-
[1] https://forge.softwareheritage.org/source/swh-loader-git/browse/master/swh/loader/git/utils.py$89-90
-
[2] https://archive.softwareheritage.org/api/1/snapshot/c2847dfd741eae21606027cf29250d1ebcd63fb4/
-
[3] rDWAPPScc652d5240
-
[4] #3625 (closed)
Migrated from T3627 (view on Phabricator)