cassandra: Add option to select (hopefully) more efficient batch insertion algos
This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values:
- 'one-per-one' is the default, and preserves the current naive behavior
- 'concurrent' and 'batch' are attempts at being more efficient
Addresses the current performance bottleneck, according to swh/infra/sysadm-environment#3493 (closed)
Migrated from D6139 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Patch application report for D6139 (id=22222)
Rebasing onto b110d1b6...
First, rewinding head to replay your work on top of it... Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos
Changes applied before test
commit cc8c2d10d4d1d36c0fdc0a3fa815162a6c74ed28 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1373/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1373/console
Build has FAILED
Patch application report for D6139 (id=22223)
Rebasing onto b110d1b6...
First, rewinding head to replay your work on top of it... Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos Applying: Fix in-mem
Changes applied before test
commit febe31d66e35063fe7330427bec648c2f9f8d070 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit 58070f87089745fbca3526e0b35dcc533c05caa9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1374/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1374/console
Build is green
Patch application report for D6139 (id=22232)
Rebasing onto b110d1b6...
First, rewinding head to replay your work on top of it... Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos Applying: Fix in-mem Applying: Update test
Changes applied before test
commit a9f95df65ddfe9085ca89eef2fd0841b806fc685 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 749b46ae5a80b870b5ef090a8652203383ea9533 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit cd0898b85b7e3a537c13a2b9495a688e834b3ff7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1375/ for more details.
The patch was test in a loader and in the replayers. The difference was not really significant on the loader but I'm not really confident in the tests as the cluster had a pretty high load (running replayers + second datacenter synchronization). I will retry with a more quieter environment to be able to isolate the loader behavior.
It was different with the directory replayers. With the
concurrent
andbatch
option, the performance seems to be really better:The first plateau is with the unpatched version. The peak is with the
concurrent
parameter The problem is it seems there is a lock contention on the counter and all the replayers stopped with this error:Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: Traceback (most recent call last): Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/bin/swh", line 11, in <module> Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: load_entry_point('swh.core==0.14.4', 'console_scripts', 'swh')() Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 185, in main Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return swh(auto_envvar_prefix="SWH") Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__ Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return self.main(*args, **kwargs) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: rv = self.invoke(ctx) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return _process_result(sub_ctx.command.invoke(sub_ctx)) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return _process_result(sub_ctx.command.invoke(sub_ctx)) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return ctx.invoke(self.callback, **ctx.params) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return callback(*args, **kwargs) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return f(get_current_context(), *args, **kwargs) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cli.py", line 194, in replay Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: client.process(worker_fn) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 265, in process Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: batch_processed, at_eof = self.handle_messages(messages, worker_fn) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 292, in handle_messages Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: worker_fn(dict(objects)) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/replay.py", line 62, in process_replay_objects Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: _insert_objects(object_type, objects, storage) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/replay.py", line 144, in _insert_objects Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: storage.origin_add(origins) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cassandra/storage.py", line 1021, in origin_add Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: OriginRow(sha1=hash_url(origin.url), url=origin.url, next_visit_id=1) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 162, in newf Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: self, *args, **kwargs, statement=self._prepared_statements[f.__name__] Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 913, in origin_add_one Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: self._add_one(statement, origin) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 315, in _add_one Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: self._increment_counter(obj.TABLE, 1) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 162, in newf Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: self, *args, **kwargs, statement=self._prepared_statements[f.__name__] Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 312, in _increment_counter Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: self._execute_with_retries(statement, [nb, object_type]) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 329, in wrapped_f Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return self.call(f, *args, **kw) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 409, in call Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: do = self.iter(retry_state=retry_state) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 356, in iter Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return fut.result() Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return self.__get_result() Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: raise self._exception Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 412, in call Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: result = fn(*args, **kwargs) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 293, in _execute_with_retries Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: return self._session.execute(statement, args, timeout=1000.0) Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "cassandra/cluster.py", line 2345, in cassandra.cluster.Session.execute Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: File "cassandra/cluster.py", line 4304, in cassandra.cluster.ResponseFuture.result Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'write_type': 'COUNTER'}
Tested with 40 and 10 directory replayers
This should be addressed by !708 (closed) + !709 (closed)
Build is green
Patch application report for D6139 (id=22276)
Could not rebase; Attempt merge onto 47a6919f...
Updating 47a6919f..d272e70b Fast-forward swh/storage/cassandra/cql.py | 105 ++++++++++++++++++++++++++++-------- swh/storage/cassandra/schema.py | 8 --- swh/storage/cassandra/storage.py | 32 +++++++++-- swh/storage/in_memory.py | 1 + swh/storage/tests/storage_tests.py | 67 ++++++++++++++++------- swh/storage/tests/test_cassandra.py | 31 ++++++++--- 6 files changed, 187 insertions(+), 57 deletions(-)
Changes applied before test
commit d272e70b9c2a2ceec484ffb7511eb6e742255319 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient commit c8a1ed7ac4adb7ca4f0bda9b12ace14b2fc521ce Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1387/ for more details.
Build is green
Patch application report for D6139 (id=22328)
Could not rebase; Attempt merge onto 3ad1bec1...
Updating 3ad1bec1..df045d45 Fast-forward swh/storage/cassandra/cql.py | 105 ++++++++++++++++++++++++++++-------- swh/storage/cassandra/schema.py | 8 --- swh/storage/cassandra/storage.py | 32 +++++++++-- swh/storage/in_memory.py | 1 + swh/storage/tests/storage_tests.py | 67 ++++++++++++++++------- swh/storage/tests/test_cassandra.py | 31 ++++++++--- 6 files changed, 187 insertions(+), 57 deletions(-)
Changes applied before test
commit df045d45cb90d9ba7f1fb87cdf20945ac42fa800 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient commit e8aad0fffc5341a78f98cb3b4c4845b6f4e95527 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1394/ for more details.
Test with 10 replayers with the 3 kind of algorithm:
- first interval: one-by-one
- second interval: concurremt
- third interval: batch:
The stats are based on the directory_add statsd metrics
concurrent
andbatch
seem to have almost the same performances. Both are faster than theone-by-one
algo butbatch
looks more stable.The
concurrent
algo have a higher impact on the cpu consumption.I will test with more replayers in parallel to confirm the tendency
Some feedback, I had to delay the benchmarks because the servers was almost full and the cluster needed to be expanded to 7 nodes. The cluster is in stabilization phase (rebuild/repair of the new node and cleanup of the old one) When it will be done, I will be able to finalize the tests Hopefully at the beginning of the next week
These are more results with different number of replayers. Each line represents a server with 20 directory replayers, the renages are for
one-by-one
,concurrent
,batch
For 4 nodes, and only for 4 nodes, the batch version collapse and is doing nothing visible and nothing is logged. There is some communication with cassandra but nothing is ingested.
After analyze, the batch mode uses huge prepared statements (PS). It seems with the concurrency of the replayers, they are evicted from the PS cache before they can be executed. The client silently reprepare the statement and retry fall in a prepared/evicted loop. Some interesting read: https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#preparing-on-multiple-nodes
finally, if the driver tries to execute a statement and finds out that the coordinator doesn’t know about it, it will re-prepare the statement on the fly (this is transparent for the client, but will cost two extra roundtrips):
From a tcpdump:
... Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache) BEGIN UNLOGGED BATCH Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache) BEGIN UNLOGGED BATCH Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache) BEGIN UNLOGGED BATCH Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache) BEGIN UNLOGGED BATCH Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache) ....
This example if for a big PS with 904
INSERT INTO directory_entry (directory_id, name, target, perms, type) VALUES (?, ?, ?, ?, ?);
In cassandra, there are a lot of logs like this one:
WARN [ScheduledTasks:1] 2021-09-07 07:45:41,884 QueryProcessor.java:101 - 576 prepared statements discarded in the last minute because cache limit reached (63 MB)
According to the documentation of the cassandra concurrent api[1], it seems the concurrency can by specified as an argument of the
execute_concurrent_with_args
method. The default is 100, but it could be interesting to check with higher or lower valuesThe concurrency parameter controls how many statements will be executed concurrently. When Cluster.protocol_version is set to 1 or 2, it is recommended that this be kept below 100 times the number of core connections per host times the number of connected hosts (see Cluster.set_core_connections_per_host()). If that amount is exceeded, the event loop thread may attempt to block on new connection creation, substantially impacting throughput. If protocol_version is 3 or higher, you can safely experiment with higher levels of concurrency.
I will probably dedicate a benchmark session for the concurrent algo with different concurrencies
! In !707 (closed), @vsellier wrote: After analyze, the batch mode uses huge prepared statements (PS). It seems with the concurrency of the replayers, they are evicted from the PS cache before they can be executed. The client silently reprepare the statement and retry fall in a prepared/evicted loop.
Ooooh, so that's what it is! Good to know.
Can you try with lower values for
BATCH_INSERT_MAX_SIZE
? (eg. 100)Build is green
Patch application report for D6139 (id=22479)
Could not rebase; Attempt merge onto 834a49d0...
Updating 834a49d0..da7e63ea Fast-forward swh/storage/cassandra/cql.py | 86 ++++++++++++++++++++++++++++--- swh/storage/cassandra/storage.py | 32 ++++++++++-- swh/storage/in_memory.py | 1 + swh/storage/migrate_extrinsic_metadata.py | 23 +++++++-- swh/storage/tests/storage_tests.py | 2 +- swh/storage/tests/test_cassandra.py | 55 +++++++++++++++++--- 6 files changed, 178 insertions(+), 21 deletions(-)
Changes applied before test
commit da7e63ea683c5dad0da64e83615a7df24156c6a2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Sep 8 11:56:49 2021 +0200 directory_entry_add_batch: Reduce churn of prepared statements By reusing the 'steady state' main statement (which is quite large) across calls. commit fc950deb20b4a2ea1f40a2617f51e3eb73f78612 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient commit 7dc2863ef1fa0701e0cf632c0ce68db8e2965817 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Sep 6 14:45:40 2021 +0200 migrate_extrinsic_metadata: Add an option to limit the number of revisions This will be used as a second pass on objects that failed with older versions of the script.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1397/ for more details.