Skip to content

mimetype indexer: edge case makes the indexer fail miserably

Some error occurs for some particular raw_content resulting in unclear error message. We must understand this case and handle this properly.

Stacktrace:

Nov 29 08:00:42 worker01.euwest.azure python3[88934]: [2017-11-29 08:00:42,355: ERROR/Worker-5723] Problem when reading contents metadata.
                                                      Traceback (most recent call last):
                                                        File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 364, in run
                                                          res = self.index(sha1, raw_content)
                                                        File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 90, in index
                                                          properties = compute_mimetype_encoding(data)
                                                        File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 25, in compute_mimetype_encoding
                                                          r = magic.detect_from_content(raw_content)
                                                        File "/usr/lib/python3/dist-packages/magic.py", line 277, in detect_from_content
                                                          none_magic.buffer(byte_content))
                                                        File "/usr/lib/python3/dist-packages/magic.py", line 155, in buffer
                                                          return str(r, 'utf-8')
                                                      TypeError: coercing to str: need a bytes-like object, NoneType found

Migrated from T861 (view on Phabricator)

Edited by Phabricator Migration user