Some git repositories are failing to be ingested because of MemoryError
Even on worker17 with a bit more involved hardware (64Gib memory), this gets killed with oom [2] (possibly related to this sentry issue as per the paste reference [1]):
swhworker@worker17:~$ swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/keybase/client
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git'
Enumerating objects: 556997, done.
Counting objects: 100% (2700/2700), done.
Compressing objects: 100% (2219/2219), done.
Total 556997 (delta 589), reused 2436 (delta 457), pack-reused 554297
INFO:swh.loader.git.loader.GitLoader:Listed 19843 refs for repo https://github.com/keybase/client
ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 339, in load
self.store_data()
File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 463, in store_data
for release in self.get_releases():
File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 349, in get_releases
for raw_obj in self.iter_objects(b"tag"):
File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 315, in iter_objects
PackData.from_file(self.pack_buffer, self.pack_size)
File "/usr/lib/python3/dist-packages/dulwich/pack.py", line 1337, in _walk_all_chains
for result in self._follow_chain(offset, type_num, None):
File "/usr/lib/python3/dist-packages/dulwich/pack.py", line 1393, in _follow_chain
unpacked = self._resolve_object(offset, obj_type_num, base_chunks)
File "/usr/lib/python3/dist-packages/dulwich/pack.py", line 1385, in _resolve_object
unpacked.decomp_chunks)
MemoryError
{'status': 'failed'}
Noticed through swh/meta$1107 (similar sentry issue [1])
This echoes with another previous task [3].
Loader git version running 0.10 and dulwich 0.19.11 [4]
-
[1] https://sentry.softwareheritage.org/share/issue/175ffd5551644b8b8171beaf627e105a/
-
[2] https://grafana.softwareheritage.org/goto/jqkSiFM7z?orgId=1
-
[4]
root@pergamon:~# clush -b -w @swh-workers "dpkg -l python3-dulwich python3-swh.loader.git" | grep ii
ii python3-dulwich 0.19.11-2 amd64 Python Git library - Python3 module
ii python3-swh.loader.git 0.10.0-1~swh1~bpo10+1 all Software Heritage Git loader
Migrated from T3457 (view on Phabricator)
Edited by Phabricator Migration user