Add counting storage proxy
It will be used in the Cassandra experiment.
Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck.
This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster.
Migrated from D6149 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Patch application report for D6149 (id=22252)
Could not rebase; Attempt merge onto b110d1b6...
Merge made by the 'recursive' strategy. swh/storage/__init__.py | 5 ++- swh/storage/cassandra/cql.py | 88 ++++++++++++++++++++++++++++++++++--- swh/storage/cassandra/storage.py | 24 ++++++++-- swh/storage/in_memory.py | 1 + swh/storage/proxies/counter.py | 66 ++++++++++++++++++++++++++++ swh/storage/tests/test_cassandra.py | 7 +-- swh/storage/tests/test_counter.py | 63 ++++++++++++++++++++++++++ 7 files changed, 238 insertions(+), 16 deletions(-) create mode 100644 swh/storage/proxies/counter.py create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit d14d3815aed40d765d6939d90396299c96a9a727 Merge: b110d1b6 1875046f Author: Jenkins user <jenkins@localhost> Date: Fri Aug 27 09:32:42 2021 +0000 Merge branch 'diff-target' into HEAD commit 1875046f31eaa61e3f999e351f86dfba66b58680 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. commit 39c7212deb5b32d2486b39d1498b6636f3c86893 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 459bc9d6656f3764120682218d87af73e881ec4b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit 6b27a722815e25c4f64ff3f137328728fbcb7518 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1376/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1376/console
Build is green
Patch application report for D6149 (id=22253)
Could not rebase; Attempt merge onto b110d1b6...
Merge made by the 'recursive' strategy. requirements-swh.txt | 1 + swh/storage/__init__.py | 5 ++- swh/storage/cassandra/cql.py | 88 ++++++++++++++++++++++++++++++++++--- swh/storage/cassandra/storage.py | 24 ++++++++-- swh/storage/in_memory.py | 1 + swh/storage/proxies/counter.py | 66 ++++++++++++++++++++++++++++ swh/storage/tests/test_cassandra.py | 7 +-- swh/storage/tests/test_counter.py | 63 ++++++++++++++++++++++++++ 8 files changed, 239 insertions(+), 16 deletions(-) create mode 100644 swh/storage/proxies/counter.py create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit 3f67bd62b7a45363aef6d80c608603b0a87c801b Merge: b110d1b6 b10788d3 Author: Jenkins user <jenkins@localhost> Date: Fri Aug 27 09:44:11 2021 +0000 Merge branch 'diff-target' into HEAD commit b10788d3789fa1010d45ac57f79a16c8c3627502 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. commit 39c7212deb5b32d2486b39d1498b6636f3c86893 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 459bc9d6656f3764120682218d87af73e881ec4b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit 6b27a722815e25c4f64ff3f137328728fbcb7518 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1377/ for more details.
mentioned in merge request !709 (closed)
mentioned in merge request !707 (closed)
Build is green
Patch application report for D6149 (id=22265)
Rebasing onto b110d1b6...
First, rewinding head to replay your work on top of it... Applying: Add counting storage proxy
Changes applied before test
commit 2bf29b23ecdfad28345476337eec695aabf26c85 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1383/ for more details.
Build is green
Patch application report for D6149 (id=22269)
Rebasing onto b110d1b6...
Current branch diff-target is up to date.
Changes applied before test
commit 47a6919fee499dd51fb0098099e895088a1a7c25 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1385/ for more details.