Commits · objstorage-check-update · Antoine Lambert / swh-objstorage

Apr 02, 2024

objstorage: Factorize object check and add ObjCorruptedError exception · 4775f08d

Implement object check in base ObjStorage class instead of duplicating that code
in every object storage backend.

Add an ObjCorruptedError exception raised when an object corruption is detected.

Improve exception messages.

Normalize imports of exception classes.

Related to swh/devel/swh-scrubber#4694.

4775f08d

Mar 29, 2024
- Apply copier template v0.2.0 · 4f274cf5
  vlorentz authored 11 months ago
  
  4f274cf5
Mar 25, 2024

tests: Add a pytest plugin introducing the swh_objstorage fixture · a8e0b4eb

Antoine Lambert authored 1 year ago and

Antoine Lambert committed 1 year ago

Add a pytest plugin enabling to create a new object storage instance
for tests using the swh_objstorage fixture.

The type of object storage to create is determined by the configuration
returned by the swh_objstorage_config fixture.

Use these new fixtures to simplify a bit some tests implementation.

Related to swh/devel/swh-scrubber#4694.

a8e0b4eb

Mar 21, 2024
- winery tests: introduce USE_CEPH environment variable · 1ef64997
  Nicolas Dandrimont authored 1 year ago
  
  This allows disabling ceph-based tests on systems where ceph is available, by setting USE_CEPH=no.
  1ef64997
- winery: try to avoid some race conditions in RO shard mapping · fabf4337
  Nicolas Dandrimont authored 1 year ago
  
  First, wait for the image to reappear read-only before marking it as mapped; secondd, raising the ShardNotMapped error if the Shard open operation fails with a FileNotFoundError.
  fabf4337
- winery benchmark: add an explicit Stats worker · 09b240a5
  Nicolas Dandrimont authored 1 year ago
  
  This allows printing internal winery stats at a given interval
  09b240a5
- winery benchmark: run a single rw shard cleaner · 7cbacda5
  Nicolas Dandrimont authored 1 year ago
  
  This matches the production behavior. While this was done, the improper mock of the rw_shard_cleaner return value was noticed and fixed.
  7cbacda5
- winery benchmark: don't wait for shards indefinitely in packer process · f580e8f2
  Nicolas Dandrimont authored 1 year ago
  
  f580e8f2
- winery benchmark: make object fetching less expensive · bc020ab5
  Nicolas Dandrimont authored 1 year ago
  
  This cuts the operation into smaller batches, and each batch calls a cheaper function.
  bc020ab5
- winery benchmark: introduce time_remaining argument for work · 04780c24
  Nicolas Dandrimont authored 1 year ago
  
  This allows the benchmark to shorten the workload of the workers, rather than wait for them to complete (potentially for a long time).
  04780c24
- winery: decrease verbosity of rbd commands · 6db5b85e
  Nicolas Dandrimont authored 1 year ago
  
  6db5b85e
- winery: increase verbosity of remapping a shard read-write · ec159df3
  Nicolas Dandrimont authored 1 year ago
  
  ec159df3
- winery: bump all copyright years to 2024 · 4022a8e5
  Nicolas Dandrimont authored 1 year ago
  
  4022a8e5
- winery: manage the lifecycle of ConnectionPools better · 3101c145
  Nicolas Dandrimont authored 1 year ago
  
  With a singleton PoolManager, with proper fork tracking and reference counting to avoid leaking ConnectionPools
  3101c145
- winery: Only run database administration operations once per process · 2986827f
  Nicolas Dandrimont authored 1 year ago
  
  By keeping track of created databases, created tables, and connection pools, we cut down drastically on the admin operations performed while the code is running.
  2986827f
- winery: Add postgresql connection pooling · d3341426
  Nicolas Dandrimont authored 1 year ago
  
  d3341426
- winery: remove confusing cursor passing logic in object deletion · 139d4668
  Nicolas Dandrimont authored 1 year ago
  
  The transaction state of the PostgreSQL connection is attached to the db-api connection object, not to the cursor (which is just a lightweight object to represent the back-and-forth between a query and the results of the queries). To share the transaction state, across functions, when using connection pooling, we need to share the connection object. In that specific case, the db attribute of the SharedBase object is already reused between calls and holds the transaction state, so the cursor passing is unnecessary.
  139d4668
- winery: add support for the application_name PostgreSQL setting · c530713c
  Nicolas Dandrimont authored 1 year ago
  
  This helped track down the various connection leaks that this code was subject to when the benchmarks would run.
  c530713c
- winery: More explicit uninit operations for shards and throttler · be63e21d
  Nicolas Dandrimont authored 1 year ago
  
  Try to avoid leaking database connections and open files.
  be63e21d
- winery: add index for per-shard statistics · 812456d9
  Nicolas Dandrimont authored 1 year ago
  
  812456d9
- winery testing: rename default erasure code profile · f5f33f2f
  Nicolas Dandrimont authored 1 year ago
  
  This avoids interfering with the production settings if any.
  f5f33f2f
- winery testing: only warn on EC profile mismatch · 6f29fcd4
  Nicolas Dandrimont authored 1 year ago
  
  Forcing a change of the EC profile has catastrophic data availability consequences, so avoid doing that at all
  6f29fcd4
- winery: use COPY to do bulk data reads · e4dee843
  Nicolas Dandrimont authored 1 year ago
  
  e4dee843
- winery: migrate to psycopg3 · c9585e1c
  Nicolas Dandrimont authored 1 year ago
  
  c9585e1c
- winery stats: use time.time instead of monotonic for line headers · 78c890ad
  Nicolas Dandrimont authored 1 year ago
  
  The monotonic clock is decorrelated with real life, so it can't be matched to statistics.
  78c890ad
- winery stats: ensure stats output csv can be reused if the pid is reused · 7520fe82
  Nicolas Dandrimont authored 1 year ago
  
  We run the benchmark workers in a process pool with fixed pids, so the files would get clobbered on every worker restart. Instead, append to the file by outputting a line of zeros to mark the reset.
  7520fe82
- winery: Disable autovacuum on shard object tables · 0cadba22
  Nicolas Dandrimont authored 1 year ago
  
  These tables are append-only and short-lived, there's not much point using resources to autovacuum them.
  0cadba22
Mar 15, 2024

tests: Migrate away from unittest to full pytest · 89361d53

Antoine Lambert authored 1 year ago

Remove the use of unittest.TestCase class.

Replace use of unittest assertions by pytest ones.

Use pytest autouse fixtures instead of setup/teardown methods.

Use tmpdir fixture to manage temporary directories.

89361d53

Mar 13, 2024

Remove empty __init__.py files · 6bd96892

David Douard authored 1 year ago

Especially the main one in swh/objstorage needs to be removed since this
later is a namespace, in addition to being a package, in which
swh-objstorage-replayer installs itself. So for the namespace importing
mechanism to work properly, it must be an actual namespace (handled by
the NamespaceLoader, not a package backed by the __init__ file).

6bd96892

Feb 06, 2024

azure: add support for using the secondary endpoint for the download_url · 8b8c3f1e

David Douard authored 1 year ago

The idea is to allow using Azure's secondary endpoint
(BlobSecondaryEndpoint) to craft the download url the storage used as
public download URL.

This is needed for the integration test where the endpoint used to add
content in an azure-backend objstorage is different from the public URL
that can be used to download the blob directly from azure (azurite in the
context of integration tests).

8b8c3f1e

Feb 05, 2024
- tox: Bump mypy to 1.8.0 · 974fc19e
  Antoine Lambert authored 1 year ago
  
  Related to swh/meta#5075.
  974fc19e
Feb 02, 2024

Implement delete() in WineryObjStorage · ddc1b777

Jérémy Bobbio (Lunar) authored 1 year ago and

Nicolas Dandrimont committed 1 year ago

We now allow to delete objects from Winery. The object is first marked
as deleted in the shared base. This makes it inaccessible from further
`get()` calls. If the object still in a RWShard, the relevant line in
the `objects` table is deleted as well.

To take care of removing the data in ROShards, a dedicated host able to
map images in read-write mode can then call:

    swh objstorage winery clean-deleted-objects

This will zero out any deleted object present in the images and remove
their key from the indices. Object keys will then be removed from the
shared database.

Thanks to olasd for proposing several improvements over the initial
submission.

Addresses swh-alter#4

ddc1b777

Switch to a state column for inflight status · 9d4d3b93

Jérémy Bobbio (Lunar) authored 1 year ago and

Nicolas Dandrimont committed 1 year ago

This paves the way to implement object deletion as another status.
(So we can later cleanup the shards in a batch process.)

Thanks to olasd for proposing several improvements over the initial
submission.

9d4d3b93

Implement a file-backed image pool for Winery tests · ed2a33eb

Jérémy Bobbio (Lunar) authored 1 year ago and

Nicolas Dandrimont committed 1 year ago

As an alternative to running the tests with a Ceph backend, we
implement an alternative image pool based on simple files.
We represent unmapped images by setting their permissions to 0o000.

The tests are modified to run with both the file-backed image pool
and the Ceph pool. Those latter ones will be skipped if Ceph is not
available.

ed2a33eb

Pin moto to < 5 · cf4c9c31

Nicolas Dandrimont authored 1 year ago

moto v5 rejigged the fixtures in a way that's incompatible with
swh.objstorage, pin it down for now.

cf4c9c31

Bump swh requirements for compatibility with pytest-postgresql 5.0 · 02b45d8c
Nicolas Dandrimont authored 1 year ago

02b45d8c

Jan 22, 2024

Skip test_list_content* on cloud implementations · 2794dc8d

Jérémy Bobbio (Lunar) authored 1 year ago

Running `test_list_content*` on cloud implementations is pretty useless
as we won’t be listing 17 billion contents from Azure or S3 in a single
thread operation. As each one takes a minute per cloud, we skip these
tests to improve the runtime of the test suite.

2794dc8d

Skip testing most compression methods by default · 0e68c84f

Jérémy Bobbio (Lunar) authored 1 year ago

The test suite is now reaching the C.I. timeout. In order to reduce the
overall runtime, we now skip testing most compression methods (bzip2,
zlib, lzma) in cloud objstorages by default. Testing all compression
methods can still be done by specifying `--all-compression-methods` on
pytest command-line.

diff --git a/swh/objstorage/tests/conftest.py b/swh/objstorage/tests/conftest.py
index 377778e..664aade 100644
--- a/swh/objstorage/tests/conftest.py
+++ b/swh/objstorage/tests/conftest.py
@@ -1,5 +1,7 @@
 import sys

+import pytest
+

 def pytest_configure(config):
     config.addinivalue_line("markers", "shard_max_size: winery backend")
@@ -14,6 +16,13 @@ def pytest_configure(config):
     config.addinivalue_line(
         "markers", "use_benchmark_flags: use the --winery-bench-* CLI flags"
     )
+    config.addinivalue_line(
+        "markers",
+        (
+            "all_compression_methods: "
+            "test all compression methods instead of only the most common ones"
+        ),
+    )

 def pytest_addoption(parser):
@@ -106,3 +115,15 @@ def pytest_addoption(parser):
         help="Maximum number of bytes per second write",
         default=100 * 1024 * 1024,
     )
+    parser.addoption(
+        "--all-compression-methods",
+        action="store_true",
+        default=False,
+        help="Test all compression methods",
+    )
+
+
+def pytest_runtest_setup(item):
+    if item.get_closest_marker("all_compression_methods"):
+        if not item.config.getoption("--all-compression-methods"):
+            pytest.skip("`--all-compression-methods` has not been specified")
diff --git a/swh/objstorage/tests/test_objstorage_azure.py b/swh/objstorage/tests/test_objstorage_azure.py
index b78e3d4..3679258 100644
--- a/swh/objstorage/tests/test_objstorage_azure.py
+++ b/swh/objstorage/tests/test_objstorage_azure.py
@@ -269,14 +269,17 @@ class TestMockedAzureCloudObjStorageGzip(TestMockedAzureCloudObjStorage):
     compression = "gzip"

+@pytest.mark.all_compression_methods
 class TestMockedAzureCloudObjStorageZlib(TestMockedAzureCloudObjStorage):
     compression = "zlib"

+@pytest.mark.all_compression_methods
 class TestMockedAzureCloudObjStorageLzma(TestMockedAzureCloudObjStorage):
     compression = "lzma"

+@pytest.mark.all_compression_methods
 class TestMockedAzureCloudObjStorageBz2(TestMockedAzureCloudObjStorage):
     compression = "bz2"

diff --git a/swh/objstorage/tests/test_objstorage_cloud.py b/swh/objstorage/tests/test_objstorage_cloud.py
index 49cbd62..1c50850 100644
--- a/swh/objstorage/tests/test_objstorage_cloud.py
+++ b/swh/objstorage/tests/test_objstorage_cloud.py
@@ -148,6 +148,7 @@ class TestCloudObjStorage(ObjStorageTestFixture, unittest.TestCase):
         pass

+@pytest.mark.all_compression_methods
 class TestCloudObjStorageBz2(TestCloudObjStorage):
     compression = "bz2"

@@ -156,10 +157,12 @@ class TestCloudObjStorageGzip(TestCloudObjStorage):
     compression = "gzip"

+@pytest.mark.all_compression_methods
 class TestCloudObjStorageLzma(TestCloudObjStorage):
     compression = "lzma"

+@pytest.mark.all_compression_methods
 class TestCloudObjStorageZlib(TestCloudObjStorage):
     compression = "zlib"

diff --git a/swh/objstorage/tests/test_objstorage_pathslicing.py b/swh/objstorage/tests/test_objstorage_pathslicing.py
index d1f5568..72c3305 100644
--- a/swh/objstorage/tests/test_objstorage_pathslicing.py
+++ b/swh/objstorage/tests/test_objstorage_pathslicing.py
@@ -8,6 +8,8 @@ import tempfile
 import unittest
 from unittest.mock import DEFAULT, patch

+import pytest
+
 from swh.model import hashutil
 from swh.objstorage import exc
 from swh.objstorage.constants import ID_DIGEST_LENGTH
@@ -144,13 +146,16 @@ class TestPathSlicingObjStorageGzip(TestPathSlicingObjStorage):
     compression = "gzip"

+@pytest.mark.all_compression_methods
 class TestPathSlicingObjStorageZlib(TestPathSlicingObjStorage):
     compression = "zlib"

+@pytest.mark.all_compression_methods
 class TestPathSlicingObjStorageBz2(TestPathSlicingObjStorage):
     compression = "bz2"

+@pytest.mark.all_compression_methods
 class TestPathSlicingObjStorageLzma(TestPathSlicingObjStorage):
     compression = "lzma"

0e68c84f

Jan 15, 2024

pytest: Fix tests execution in development virtual environment · 857a6d25

Antoine Lambert authored 1 year ago

Since migration to PEP 517, executing "make test" or "pytest" in the root
directory of the objstorage package triggers the following error:

ImportError: cannot import name 'add_filters' from 'swh.objstorage.multiplexer.filter'

Explicitly setting the testpaths option in pytest.ini and TEST_DIRS
variable in Makefile.local to swh/objstorage/tests fix the issue.

857a6d25

Jan 05, 2024
- winery: hook up the `rbd_create_images` setting to the packer cli · da7d6b89
  Nicolas Dandrimont authored 1 year ago
  
  da7d6b89