metadata indexation : ES' dynamic mapping creation fails for field values that are of varying types
It happens on staging after the activation of the indexation of the metadata generated by the indexers :
Dec 10 11:12:25 search0 swh[31455]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin/_bulk [status:200 request:0.096s]
Dec 10 11:12:25 search0 swh[31455]: Traceback (most recent call last):
Dec 10 11:12:25 search0 swh[31455]: File "/usr/bin/swh", line 11, in <module>
Dec 10 11:12:25 search0 swh[31455]: load_entry_point('swh.core==0.11.0', 'console_scripts', 'swh')()
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 185, in main
Dec 10 11:12:25 search0 swh[31455]: return swh(auto_envvar_prefix="SWH")
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
Dec 10 11:12:25 search0 swh[31455]: return self.main(*args, **kwargs)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
Dec 10 11:12:25 search0 swh[31455]: rv = self.invoke(ctx)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
Dec 10 11:12:25 search0 swh[31455]: return _process_result(sub_ctx.command.invoke(sub_ctx))
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
Dec 10 11:12:25 search0 swh[31455]: return _process_result(sub_ctx.command.invoke(sub_ctx))
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
Dec 10 11:12:25 search0 swh[31455]: return _process_result(sub_ctx.command.invoke(sub_ctx))
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
Dec 10 11:12:25 search0 swh[31455]: return ctx.invoke(self.callback, **ctx.params)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
Dec 10 11:12:25 search0 swh[31455]: return callback(*args, **kwargs)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
Dec 10 11:12:25 search0 swh[31455]: return f(get_current_context(), *args, **kwargs)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/swh/search/cli.py", line 102, in journal_client_objects
Dec 10 11:12:25 search0 swh[31455]: nb_messages = client.process(worker_fn)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 265, in process
Dec 10 11:12:25 search0 swh[31455]: batch_processed, at_eof = self.handle_messages(messages, worker_fn)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 292, in handle_messages
Dec 10 11:12:25 search0 swh[31455]: worker_fn(dict(objects))
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/swh/search/journal_client.py", line 31, in process_journal_objects
Dec 10 11:12:25 search0 swh[31455]: process_origin_intrinsic_metadata(messages["origin_intrinsic_metadata"], search)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/swh/search/journal_client.py", line 77, in process_origin_intrinsic_metadata
Dec 10 11:12:25 search0 swh[31455]: search.origin_update(origin_metadata)
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/swh/search/elasticsearch.py", line 106, in origin_update
Dec 10 11:12:25 search0 swh[31455]: bulk(self._backend, actions, index="origin")
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/elasticsearch/helpers/actions.py", line 300, in bulk
Dec 10 11:12:25 search0 swh[31455]: for ok, item in streaming_bulk(client, actions, *args, **kwargs):
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/elasticsearch/helpers/actions.py", line 230, in streaming_bulk
Dec 10 11:12:25 search0 swh[31455]: **kwargs
Dec 10 11:12:25 search0 swh[31455]: File "/usr/lib/python3/dist-packages/elasticsearch/helpers/actions.py", line 158, in _process_bulk_chunk
Dec 10 11:12:25 search0 swh[31455]: raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
Dec 10 11:12:25 search0 swh[31455]: elasticsearch.helpers.errors.BulkIndexError: ('1 document(s) failed to index.', [{'update': {'_index': 'origin', '_type': '_doc', '_id': '81eff50ed4f9d136e93da51c503054df62c0a0f0', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': 'object mapping for [intrinsic_metadata.author.affiliation] tried to parse field [affiliation] as object, but found a concrete value'}, 'data': {'doc': {'url': 'https://pypi.org/project/brightway2/', 'intrinsic_metadata': {'@context': 'https://doi.org/10.5063/schema/codemeta-2.0', 'type': ('Code', 'SoftwareSourceCode'), 'author': ({'id': '0000-0002-7898-9862', 'type': 'Person', 'affiliation': 'Paul Scherrer Institut', 'email': 'cmutel@gmail.com', 'name': 'Chris Mutel'}, {'type': 'Person', 'email': 'cmutel@gmail.com', 'name': 'Chris Mutel'}), 'codeRepository': 'https://bitbucket.org/cmutel/brightway2', 'dateCreated': '2017-04-05', 'dateModified': '2017-04-05', 'datePublished': '2017-04-05', 'description': 'Framework for Life Cycle Assessment', 'identifier': './', 'keywords': 'LCA, Python', 'license': ('BSD 3-clause', 'Copyright (c) 2016, Chris Mutel and ETH Zurich.'), 'name': 'brightway2', 'url': 'https://bitbucket.org/cmutel/brightway2', 'version': ('2.0.2', '2.3')}, 'sha1': '81eff50ed4f9d136e93da51c503054df62c0a0f0'}, 'doc_as_upsert': True}}}])
Dec 10 11:12:25 search0 systemd[1]: swh-search-journal-client@indexed.service: Main process exited, code=exited, status=1/FAILURE
It seems there is no explicit mapping declared and the one created by ES is not generic enough :
"author": {
"properties": {
"affiliation": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
complete mapping of the index: new_mapping.json generated with :
root@search-esnode0:/srv/elasticsearch# curl http://${ES_SERVER}/origin/_mapping\?pretty | jq '.origin.mappings' > /tmp/new_mapping.js
Migrated from T2876 (view on Phabricator)