Skip to content

Fix UnicodeDecodeError in revision metadata conversion

Trying to browse that url: https://archive.softwareheritage.org/browse/origin/https://www.mercurial-scm.org/repo/hg/ currently raises the following error:

Traceback (most recent call last):
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/views/utils/snapshot_context.py", line 239, in browse_snapshot_directory
    browse_context='directory') # noqa
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/views/utils/snapshot_context.py", line 135, in _process_snapshot_request
    origin_url, timestamp, visit_id)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 938, in get_snapshot_context
    snapshot_id)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 468, in get_origin_visit_snapshot
    return get_snapshot_content(visit_info['snapshot'])
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 425, in get_snapshot_content
    branches, releases = process_snapshot_branches(snapshot)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 356, in process_snapshot_branches
    for revision in revisions:
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/service.py", line 453, in 
    return (converters.from_revision(r) for r in revisions)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 281, in from_revision
    dates={'date', 'committer_date'})
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 149, in from_swh
    new_dict[key] = convert_fn(value)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 242, in convert_revision_metadata
    return json.loads(json.dumps(metadata, cls=SWHMetadataEncoder))
  File "/usr/local/lib/python3.7/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/local/lib/python3.7/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.7/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 230, in default
    return obj.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 4: invalid start byte

This needs to be fixed.


Migrated from T1727 (view on Phabricator)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information