Skip to content

Indexer mimetype - Fix parsing error

Once in a while, the indexer mimetype raises an error when it tries to parse the output of file --mime error.

Oct 10 07:49:13 worker01.euwest.azure python3[48818]: Traceback (most recent call last):
Oct 10 07:49:13 worker01.euwest.azure python3[48818]:   File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 220, in run
Oct 10 07:49:13 worker01.euwest.azure python3[48818]:     res = self.index_content(sha1, raw_content)
Oct 10 07:49:13 worker01.euwest.azure python3[48818]:   File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 97, in index_content
Oct 10 07:49:13 worker01.euwest.azure python3[48818]:     properties = compute_mimetype_encoding(raw_content, log=self.log)
Oct 10 07:49:13 worker01.euwest.azure python3[48818]:   File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 33, in compute_mimetype_encoding
Oct 10 07:49:13 worker01.euwest.azure python3[48818]:     encoding = res[1].split(b'=')[1]
Oct 10 07:49:13 worker01.euwest.azure python3[48818]: IndexError: list index out of range
Oct 10 07:49:13 worker01.euwest.azure python3[49118]: [2017-10-10 07:49:13,252: ERROR/Worker-2] Problem when reading contents metadata.
                                                      Traceback (most recent call last):
                                                        File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 220, in run
                                                          res = self.index_content(sha1, raw_content)
                                                        File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 97, in index_content
                                                          properties = compute_mimetype_encoding(raw_content)
                                                        File "/usr/lib/python3/dist-packages/swh/indexer/mimetype.py", line 33, in compute_mimetype_encoding
                                                          encoding = res[1].split(b'=')[1]
                                                      IndexError: list index out of range
Oct 10 07:49:13 worker01.euwest.azure python3[49118]: [2017-10-10 07:49:13,364: WARNING/Worker-2] Rescheduling batch

Migrated from T801 (view on Phabricator)