[cassandra] Git loader performance are very bad
With all the replayers stopped and no particular load on the cluster, the loading of almost all the git repository are failing in timeout. Not tested with other types of origin
Reference in current production:
swhworker@worker01:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/slackhq/nebula
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/slackhq/nebula' with type 'git'
Enumerating objects: 1461, done.
Counting objects: 100% (307/307), done.
Compressing objects: 100% (186/186), done.
Total 1461 (delta 166), reused 193 (delta 109), pack-reused 1154
INFO:swh.loader.git.loader.GitLoader:Listed 293 refs for repo https://github.com/slackhq/nebula
{'status': 'eventful'}
real 0m9.096s
user 0m1.551s
sys 0m0.064s
With the cassandra backend:
swh@6490bac3ba28:/$ time swh loader run git https://github.com/slackhq/nebula
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/slackhq/nebula' with type 'git'
Enumerating objects: 3317, done.
Counting objects: 100% (1128/1128), done.
Compressing objects: 100% (508/508), done.
Total 3317 (delta 696), reused 915 (delta 601), pack-reused 2189
INFO:swh.loader.git.loader.GitLoader:Listed 293 refs for repo https://github.com/slackhq/nebula
ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status
Traceback (most recent call last):
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 339, in load
self.store_data()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 467, in store_data
self.flush()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush
self.storage.flush()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 156, in flush
stats = add_fn(list(batch))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 59, in revision_add
missing_ids = self._filter_missing_ids("revision", (r.id for r in revisions))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 116, in _filter_missing_ids
return set(fn(missing_ids))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_
return self.post(meth._endpoint_path, post_data)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post
return self._decode_response(response)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 354, in _decode_response
self.raise_for_status(response)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 29, in raise_for_status
super().raise_for_status(response)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 344, in raise_for_status
raise exception from None
swh.core.api.RemoteException: <RemoteException 500 ReadTimeout: ['Error from server: code=1200 [Coordinator node timed out waiting for replica nodes\' responses] message="Operation timed out - received only 0 responses." info={\'consistency\': \'LOCAL_ONE\', \'required_responses\': 1, \'received_responses\': 0}']>
Traceback (most recent call last):
File "/srv/softwareheritage/venv/bin/swh", line 8, in <module>
sys.exit(main())
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/cli/__init__.py", line 185, in main
return swh(auto_envvar_prefix="SWH")
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/cli.py", line 105, in run
result = loader.load()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 382, in load
self.flush()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush
self.storage.flush()
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 156, in flush
stats = add_fn(list(batch))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 59, in revision_add
missing_ids = self._filter_missing_ids("revision", (r.id for r in revisions))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 116, in _filter_missing_ids
return set(fn(missing_ids))
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_
return self.post(meth._endpoint_path, post_data)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post
return self._decode_response(response)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 354, in _decode_response
self.raise_for_status(response)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 29, in raise_for_status
super().raise_for_status(response)
File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 344, in raise_for_status
raise exception from None
swh.core.api.RemoteException: <RemoteException 500 ReadTimeout: ['Error from server: code=1200 [Coordinator node timed out waiting for replica nodes\' responses] message="Operation timed out - received only 0 responses." info={\'consistency\': \'LOCAL_ONE\', \'required_responses\': 1, \'received_responses\': 0}']>
real 8m59.338s <------------ zzzZZZZZ
user 0m5.499s
sys 0m0.179s
Storage logs:
swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/add_multi
swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/visit/add
swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/visit/get_latest
swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/visit_status/get_latest
swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /snapshot/get_branches
swh-storage_1 | [2021-08-19 13:56:05 +0000] [32] [DEBUG] POST /content/missing
swh-storage_1 | [2021-08-19 13:56:56 +0000] [32] [DEBUG] POST /content/add
swh-storage_1 | [2021-08-19 13:57:40 +0000] [32] [DEBUG] POST /directory/missing
swh-storage_1 | [2021-08-19 13:57:41 +0000] [32] [DEBUG] POST /directory/add
swh-storage_1 | [2021-08-19 13:59:12 +0000] [32] [DEBUG] POST /revision/missing
swh-storage_1 | ERROR:root:Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0}
swh-storage_1 | Traceback (most recent call last):
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
swh-storage_1 | rv = self.dispatch_request()
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
swh-storage_1 | return self.view_functions[rule.endpoint](**req.view_args)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 153, in newf
swh-storage_1 | return f.negotiator(*args, **kwargs)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 81, in __call__
swh-storage_1 | result = self.func(*args, **kwargs)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 460, in _f
swh-storage_1 | return obj_meth(**kw)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/storage.py", line 593, in revision_missing
swh-storage_1 | return self._cql_runner.revision_missing(revisions)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 145, in newf
swh-storage_1 | self, *args, **kwargs, statement=self._prepared_statements[f.__name__]
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 542, in revision_missing
swh-storage_1 | return self._missing(statement, ids)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 305, in _missing
swh-storage_1 | rows = self._execute_with_retries(statement, [ids])
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 324, in wrapped_f
swh-storage_1 | return self(f, *args, **kw)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 404, in __call__
swh-storage_1 | do = self.iter(retry_state=retry_state)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 349, in iter
swh-storage_1 | return fut.result()
swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
swh-storage_1 | return self.__get_result()
swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
swh-storage_1 | raise self._exception
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 407, in __call__
swh-storage_1 | result = fn(*args, **kwargs)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 272, in _execute_with_retries
swh-storage_1 | return self._session.execute(statement, args, timeout=1000.0)
swh-storage_1 | File "cassandra/cluster.py", line 2618, in cassandra.cluster.Session.execute
swh-storage_1 | File "cassandra/cluster.py", line 4894, in cassandra.cluster.ResponseFuture.result
swh-storage_1 | cassandra.ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses':
1, 'received_responses': 0}
swh-storage_1 | [2021-08-19 13:59:17 +0000] [32] [DEBUG] POST /origin/visit_status/add
swh-storage_1 | [2021-08-19 13:59:17 +0000] [32] [DEBUG] POST /content/missing
swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /content/add
swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /directory/missing
swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /directory/add
swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /revision/missing
swh-storage_1 | ERROR:root:Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0}
swh-storage_1 | Traceback (most recent call last):
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
swh-storage_1 | rv = self.dispatch_request()
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
swh-storage_1 | return self.view_functions[rule.endpoint](**req.view_args)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 153, in newf
swh-storage_1 | return f.negotiator(*args, **kwargs)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 81, in __call__
swh-storage_1 | result = self.func(*args, **kwargs)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 460, in _f
swh-storage_1 | return obj_meth(**kw)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/storage.py", line 593, in revision_missing
swh-storage_1 | return self._cql_runner.revision_missing(revisions)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 145, in newf
swh-storage_1 | self, *args, **kwargs, statement=self._prepared_statements[f.__name__]
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 542, in revision_missing
swh-storage_1 | return self._missing(statement, ids)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 305, in _missing
swh-storage_1 | rows = self._execute_with_retries(statement, [ids])
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 324, in wrapped_f
swh-storage_1 | return self(f, *args, **kw)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 404, in __call__
swh-storage_1 | do = self.iter(retry_state=retry_state)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 349, in iter
swh-storage_1 | return fut.result()
swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
swh-storage_1 | return self.__get_result()
swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
swh-storage_1 | raise self._exception
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 407, in __call__
swh-storage_1 | result = fn(*args, **kwargs)
swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 272, in _execute_with_retries
swh-storage_1 | return self._session.execute(statement, args, timeout=1000.0)
swh-storage_1 | File "cassandra/cluster.py", line 2618, in cassandra.cluster.Session.execute
swh-storage_1 | File "cassandra/cluster.py", line 4894, in cassandra.cluster.ResponseFuture.result
swh-storage_1 | cassandra.ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0}
swh-storage_1 | [2021-08-19 14:04:58 +0000] [32] [DEBUG] Closing connection.
Some queries are logged in the cassandra logs:
INFO [ScheduledTasks:1] 2021-08-19 16:00:16,534 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log)
DEBUG [ScheduledTasks:1] 2021-08-19 16:00:16,535 MonitoringTask.java:175 - 1 operations were slow in the last 4999 msecs:
<SELECT ctime, length, status FROM swh.content WHERE token(sha1, sha1_git, sha256, blake2s256) >= 1987356847901115199 AND token(sha1, sha1_git, sha256, blake2s256) <= 1987356847901115199 LIMIT 5000>, time 1067 msec - slow timeout 1000 msec/cross-node
DEBUG [ScheduledTasks:1] 2021-08-19 16:02:11,537 MonitoringTask.java:175 - 1 operations were slow in the last 5000 msecs:
<SELECT ctime, length, status FROM swh.content WHERE token(sha1, sha1_git, sha256, blake2s256) >= 8339318752667056020 AND token(sha1, sha1_git, sha256, blake2s256) <= 8339318752667056020 LIMIT 5000>, time 1114 msec - slow timeout 1000 msec
DEBUG [ScheduledTasks:1] 2021-08-19 16:02:14,058 SSLFactory.java:354 - Checking whether certificates have been updated []
Migrated from T3493 (view on Phabricator)