Optimize the number of HTTP requests sent by the cgit lister
Current implementation of cgit lister will request main HTML page of each listed repository to extract the git clone URL.
However, sending a lot of requests in a short amount of time is not really friendly for a cgit server.
We could avoid to request each repository page by constructing a git repository clone URL from the cgit server home URL and a clone URL prefix passed as lister parameter.
Repositories home page URLs are in the form: <cgit_server_url>/<path_to_repo>
and extracted from the HTML pages listing hosted repositories.
Repositories clone URLS will then be in the form: <cgit_clone_prefix_url>/<path_to_repo>
.
Examples based on the cgit instances listed in #1835 (closed) plus a couple of other found ones in the wild:
|---------------------------------------------+-----------------------------------|
| cgit_server_url | git_clone_prefix_url |
|---------------------------------------------+-----------------------------------|
| https://git.kernel.org/ | https://git.kernel.org/ |
| https://gitweb.torproject.org/ | https://gitweb.torproject.org/ |
| https://fedorapeople.org/cgit/ | https://fedorapeople.org/cgit/ |
| https://git.openembedded.org/ | https://git.openembedded.org/ |
| https://git.zx2c4.com/ | https://git.zx2c4.com/ |
| http://git.gnu.org.ua/cgit/ | http://git.gnu.org.ua/repo/ |
| https://git.alpinelinux.org/ | https://git.alpinelinux.org/ |
| https://git.baserock.org/cgit/ | https://git.baserock.org/git/ |
| https://code.qt.io/cgit/ | http://code.qt.io/ |
| http://git.yoctoproject.org/clean/cgit.cgi/ | https://git.yoctoproject.org/git/ |
| https://forge.frm2.tum.de/cgit/cgit.cgi/ | https://forge.frm2.tum.de/review/ |
| https://git.eclipse.org/c/ | https://git.eclipse.org/r/ |
| http://hdiff.luite.com/cgit/ | http://hdiff.luite.com/cgit/ |
| https://git.systemreboot.net/ | https://git.systemreboot.net/ |
| https://git.netfilter.org/ | git://git.netfilter.org/ |
| https://jff.email/cgit/ | git://jff.email/opt/git/ |
| https://inqlab.net/git/ | https://inqlab.net/git/ |
|---------------------------------------------+-----------------------------------|
While testing some clone URLs, I remembered that some cgit instances do not offer git clone URLs supporting git smart transfer protocol. So we will be able to list the repositories but not to load them into the archive as dulwich does not support git dumb transfer protocol (swh-loader-git#2489 (closed), related sentry issue).
Migrated from T2999 (view on Phabricator)