Skip to content

Gitorious import: loose object parsing error with corrupted file as empty one

This error seems to be relative to parsing error of legacy object within the dulwich dependency.

To reproduce, use the following repository /srv/storage/space/mirrors/gitorious.org/mnt/repositories/sortix/sortix-gitorious-wiki.git:

repo = 'sortix-gitorious-wiki.git'
origin_url = 'http://foo/bar/git'

import logging
logging.basicConfig(level=logging.DEBUG)

from swh.loader.git.tasks import LoadDiskGitRepository

t = LoadDiskGitRepository()
t.run(origin_url=origin_url, directory=repo, date='2016-05-03T15:16:32+00:00')

output:

DEBUG:swh.scheduler.task.LoadDiskGitRepository:Creating git origin for http://foo/bar/sortix-gitorious-wiki.git
DEBUG:swh.scheduler.task.LoadDiskGitRepository:Done creating git origin for http://foo/bar/sortix-gitorious-wiki.git
Traceback (most recent call last):
  File "./load-git-disk.py", line 20, in <module>
    main()
  File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "./load-git-disk.py", line 17, in main
    t.run(origin_url=origin_url, directory=repo, date='2016-05-03T15:16:32+00:00')
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 35, in run
    raise e from None
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 32, in run
    result = self.run_task(*args, **kwargs)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/tasks.py", line 39, in run_task
    return loader.load(origin_url, directory, dateutil.parser.parse(date))
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/base.py", line 422, in load
    self.fetch_data()
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-git/swh/loader/git/loader.py", line 48, in fetch_data
    type_name = self.repo[oid].type_name
  File "/usr/lib/python3/dist-packages/dulwich/repo.py", line 470, in __getitem__
    return self.object_store[name]
  File "/usr/lib/python3/dist-packages/dulwich/object_store.py", line 118, in __getitem__
    type_num, uncomp = self.get_raw(sha)
  File "/usr/lib/python3/dist-packages/dulwich/object_store.py", line 372, in get_raw
    ret = self._get_loose_object(hexsha)
  File "/usr/lib/python3/dist-packages/dulwich/object_store.py", line 521, in _get_loose_object
    return ShaFile.from_path(path)
  File "/usr/lib/python3/dist-packages/dulwich/objects.py", line 370, in from_path
    return cls.from_file(f)
  File "/usr/lib/python3/dist-packages/dulwich/objects.py", line 376, in from_file
    obj = cls._parse_file(f)
  File "/usr/lib/python3/dist-packages/dulwich/objects.py", line 346, in _parse_file
    if cls._is_legacy_object(map):
  File "/usr/lib/python3/dist-packages/dulwich/objects.py", line 338, in _is_legacy_object
    b0 = ord(magic[0:1])
TypeError: ord() expected a character, but string of length 0 found

Note: load-git-disk.py is a wrapper around the scenario described (cf. swh/meta$183)


Migrated from T816 (view on Phabricator)

Edited by Phabricator Migration user