mimetype indexer: edge case makes the indexer fail miserably
Some error occurs for some particular raw_content resulting in unclear error message. We must understand this case and handle this properly.
Stacktrace:
Nov 29 08:00:42 worker01.euwest.azure python3[88934]: [2017-11-29 08:00:42,355: ERROR/Worker-5723] Problem when reading contents metadata.
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 364, in run
res = self.index(sha1, raw_content)
File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 90, in index
properties = compute_mimetype_encoding(data)
File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 25, in compute_mimetype_encoding
r = magic.detect_from_content(raw_content)
File "/usr/lib/python3/dist-packages/magic.py", line 277, in detect_from_content
none_magic.buffer(byte_content))
File "/usr/lib/python3/dist-packages/magic.py", line 155, in buffer
return str(r, 'utf-8')
TypeError: coercing to str: need a bytes-like object, NoneType found
Migrated from T861 (view on Phabricator)
Edited by Phabricator Migration user