Git origin without smart transfer protocol support cannot be loaded
Git supports two transfer protocols to exchange data between two repositories: the dumb protocol and the smart protocol.
Nowadays, the smart protocol is a common method of transferring data because it is more efficient.
The smart protocol support can be checked by inspecting the Content-Type
HTTP header when
sending a GET
request to the /info/refs
endpoint of the git server. In that case, the content type
starts with application/x-git-
.
antoine@guggenheim:/tmp$ curl -i https://forge.softwareheritage.org/source/swh-loader-git/info/refs?service=git-upload-pack
HTTP/1.1 200 OK
Date: Thu, 09 Jul 2020 17:54:57 GMT
Server: Apache
X-Frame-Options: Deny
Strict-Transport-Security: max-age=0; includeSubdomains; preload
Content-Security-Policy: default-src 'self' https://forge.softwareheritage.org; img-src 'self' https://forge.softwareheritage.org data:; style-src 'self' https://forge.softwareheritage.org 'unsafe-inline'; script-src 'self' https://forge.softwareheritage.org; connect-src 'self'; frame-src 'self'; frame-ancestors 'none'; object-src 'none'; form-action 'self'; base-uri 'none'
Referrer-Policy: no-referrer
Expires: Fri, 01 Jan 1980 00:00:00 GMT
Pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Content-Type: application/x-git-upload-pack-advertisement
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
Nevertheless, some git servers do no seem to support the smart protocol and a fallback to the dumb protocol will be used by the official git client in that case. This is notably the case on numerous cgit instances in the wild (for instance here or here ).
antoine@guggenheim:/tmp$ curl -i https://git.systemreboot.net/guile-xapian/info/refs?service=git-upload-pack
HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Thu, 09 Jul 2020 18:04:56 GMT
Content-Type: text/plain; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Content-Disposition: inline; filename="info/refs"
Last-Modified: Thu, 09 Jul 2020 18:04:56 GMT
Expires: Thu, 09 Jul 2020 18:09:56 GMT
Strict-Transport-Security: max-age=63072000; includeSubdomains; preload
612317c1335fc5b9fe64eb09e3dc2bb508d100c6 refs/heads/master
59b340eb869c4966c5eb92309df6293ceb4db6ea refs/tags/v0.1.0
612317c1335fc5b9fe64eb09e3dc2bb508d100c6 refs/tags/v0.1.0^{}
The git loader relies on the dulwich package to fetch git pack data but unfortunately only the smart protocol is supported for that operation in current dulwich implementation.
Related sentry bug report: https://sentry.softwareheritage.org/organizations/swh/issues/2343/events/latest/?project=8
As a workaround in order to be able to load git origins without smart transfer protocol support, we could build the pack file to fetch client side using dulwich pack API and getting git objects through the dumb transfer protocol.
Migrated from T2489 (view on Phabricator)