git loader: fail to ingest our own hello world repository
In the dogfooding category, it would be nice that we can ingest our self-hosted Git repositories without relying on the fact that they are also on GitHub :-)
Unfortunately, trying to run the git loader on, e.g., the hello world repo, fails like this:
2018-09-14 17:00:54,400 25707 Creating git origin for https://forge.softwareheritage.org/source/helloworld.git
2018-09-14 17:00:54,404 25707 Starting new HTTP connection (1): localhost
2018-09-14 17:00:54,408 25707 http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1
2018-09-14 17:00:54,408 25707 Done creating git origin for https://forge.softwareheritage.org/source/helloworld.git
2018-09-14 17:00:54,409 25707 Creating origin_visit for origin 2 at time 2018-09-14 15:00:54.400801+00:00
2018-09-14 17:00:54,411 25707 Resetting dropped connection: localhost
2018-09-14 17:00:54,415 25707 http://localhost:5002 "POST /origin/visit/add HTTP/1.1" 200 16
2018-09-14 17:00:54,415 25707 Done Creating origin_visit for origin 2 at time 2018-09-14 15:00:54.400801+00:00
2018-09-14 17:00:54,417 25707 Resetting dropped connection: localhost
2018-09-14 17:00:54,420 25707 http://localhost:5002 "POST /fetch_history/start HTTP/1.1" 200 1
2018-09-14 17:00:54,422 25707 Resetting dropped connection: localhost
2018-09-14 17:00:54,425 25707 http://localhost:5002 "POST /snapshot/latest HTTP/1.1" 200 1
2018-09-14 17:00:54,427 25707 Resetting dropped connection: localhost
2018-09-14 17:00:54,431 25707 http://localhost:5002 "POST /snapshot/latest HTTP/1.1" 200 1
2018-09-14 17:00:54,432 25707 Starting new HTTPS connection (1): forge.softwareheritage.org
2018-09-14 17:00:54,760 25707 https://forge.softwareheritage.org:443 "GET /source/helloworld.git/info/refs?service=git-upload-pack HTTP/1.1" 200 None
2018-09-14 17:00:54,762 25707 Loading failure, updating to `partial` status
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/dulwich/protocol.py", line 200, in read_pkt_line
sizestr = read(4)
File "/usr/lib/python3.6/gzip.py", line 276, in read
return self._buffer.read(size)
File "/usr/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.6/gzip.py", line 463, in read
if not self._read_gzip_header():
File "/usr/lib/python3.6/gzip.py", line 411, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'00')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 889, in load
more_data_to_fetch = self.fetch_data()
File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-git/swh/loader/git/updater.py", line 260, in fetch_data
do_progress)
File "/home/zack/dati/projects/sw-heritage/git/swh-environment/swh-loader-git/swh/loader/git/updater.py", line 202, in fetch_pack_from_origin
progress=do_activity)
File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1544, in fetch_pack
b"git-upload-pack", url)
File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1449, in _discover_references
[pkt] = list(proto.read_pkt_seq())
File "/usr/lib/python3/dist-packages/dulwich/protocol.py", line 254, in read_pkt_seq
pkt = self.read_pkt_line()
File "/usr/lib/python3/dist-packages/dulwich/protocol.py", line 212, in read_pkt_line
raise GitProtocolError(e)
dulwich.errors.GitProtocolError: Not a gzipped file (b'00')
2018-09-14 17:00:54,771 25707 Resetting dropped connection: localhost
2018-09-14 17:00:54,779 25707 http://localhost:5002 "POST /fetch_history/end HTTP/1.1" 200 1
2018-09-14 17:00:54,781 25707 Updating origin_visit for origin 2 with status partial
2018-09-14 17:00:54,785 25707 Resetting dropped connection: localhost
2018-09-14 17:00:54,793 25707 http://localhost:5002 "POST /origin/visit/update HTTP/1.1" 200 1
2018-09-14 17:00:54,795 25707 Done updating origin_visit for origin 2 with status partial
git clone on the same URL works just fine. I suspect this affects all our repos hosted on forge.softwareheritage.org, but haven't tried.
Migrated from T1195 (view on Phabricator)