normalize encoding values across mimetype and language indexers

In the language indexer, we need to detect the encoding to permit to compute the language from the text.

As we already compute the content to detect the mimetype and the encoding in a prior step, we should use that encoding. But an implementation detail prevents this.

The encoding detected by the cli 'file' used in the mimetype indexer and the native decoding of our environment (python) does not match. We should normalize this.

Migrated from T728 (view on Phabricator)