Fix UnicodeDecodeError in revision metadata conversion
Trying to browse that url: https://archive.softwareheritage.org/browse/origin/https://www.mercurial-scm.org/repo/hg/ currently raises the following error:
Traceback (most recent call last):
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/views/utils/snapshot_context.py", line 239, in browse_snapshot_directory
browse_context='directory') # noqa
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/views/utils/snapshot_context.py", line 135, in _process_snapshot_request
origin_url, timestamp, visit_id)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 938, in get_snapshot_context
snapshot_id)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 468, in get_origin_visit_snapshot
return get_snapshot_content(visit_info['snapshot'])
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 425, in get_snapshot_content
branches, releases = process_snapshot_branches(snapshot)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/browse/utils.py", line 356, in process_snapshot_branches
for revision in revisions:
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/service.py", line 453, in
return (converters.from_revision(r) for r in revisions)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 281, in from_revision
dates={'date', 'committer_date'})
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 149, in from_swh
new_dict[key] = convert_fn(value)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 242, in convert_revision_metadata
return json.loads(json.dumps(metadata, cls=SWHMetadataEncoder))
File "/usr/local/lib/python3.7/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/usr/local/lib/python3.7/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/lib/python3.7/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/web/common/converters.py", line 230, in default
return obj.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 4: invalid start byte
This needs to be fixed.
Migrated from T1727 (view on Phabricator)