- Oct 04, 2023
-
-
Jérémy Bobbio (Lunar) authored
In order to be able to handle takedown notices, we need to be able to remove objects from storage. Depends !1077 Related to swh-alter#5
-
Jérémy Bobbio (Lunar) authored
This is a pretty direct adaptation of what was done in https://gitlab.softwareheritage.org/swh/devel/snippets/-/blob/0d8b6877/takedowns/gen_removal_sql.py Closes: #4687 Depends on !1077
-
Jérémy Bobbio (Lunar) authored
swh-alter needs an interface in order to remove objects from the storage (so we can handle takedown notices). The chosen interface is optimal from swh-alter point of view: it identifies a whole range of objects with different types that can be removed alongside a given set of origins. Giving all objects to be removed at once might also help with consistency constraints inside the various storage facilities. Only objects from this facility will be removed. The same method should be called on other storage, objstorage, or journal instances where the specified objects need to be removed.
-
- Sep 26, 2023
-
-
David Douard authored
it was using a generator like a list, effectively only inserting one content object per batch. The test needs to be modified to actually show the misbehavior, with the need for a inserting first one object before inserting a list of objects, otherwise the test is green without the fix (the reason is left as an exercice for the curious reader). Also fix a small type inconsistency on content_add() and use the same test pattern in test_content_add().
-
David Douard authored
-
Jérémy Bobbio (Lunar) authored
When browsing methods in StorageInterface, it is very easy to overlook helper functions that happens to lie in the `swh.storage.algos` package. To make it slightly easier to find those, add some “see also” section for listing directory entries, branches, origin visits and origin visit statuses.
-
- Sep 25, 2023
-
-
Antoine R. Dumont authored
This fails in debian bullseye. Reading the storage interface for the 'directory_missing' method, there is no order guarantee in the output. Refs. swh/infra/sysadm-environment#5047
-
- Sep 22, 2023
-
-
Antoine R. Dumont authored
This fails in debian bullseye. Reading the storage interface for that method, there is no order guarantee in the output. Refs. swh/infra/sysadm-environment#5047
-
- Sep 20, 2023
-
-
JAVA_HOME needs to point to the installation directory, not the `java` binary itself.
-
- Sep 19, 2023
-
-
Antoine R. Dumont authored
This fails in debian bullseye for some reasons. Reading the storage interface for that method, there is no order guarantee in the output either. Refs. swh/infra/sysadm-environment#5047
-
- Sep 12, 2023
-
-
Raphaël Gomès authored
The initial implementation was incorrectly put in `swh-loader-core`. This simply moves it (along with its sister change in the other module) to the correct place.
-
- Sep 11, 2023
-
-
Antoine Lambert authored
It fixes debian package build on unstable.
-
- Sep 06, 2023
-
-
David Douard authored
This accepts a file of swhids of objects that are known to be invalid (hash mismatch) but should be replayed anyway (typically because they do exist as is in the original storage). The file is expected to have rows like: swh:1:xxx:<invalid_hex_hash>,<expected_hex_hash> [...] Note that the cli only accepts swhids in the exception file, while the backend (ModelObjectDeserializer) support all HashableObject. But we currently do not need this feature on the cli tool for other object types, and doing it this way is simpler in terms of type annotation.
-
David Douard authored
The idea is that 2 workers may insert similar directories concurrently, thus attempt to create identical DirectoryEntry objects in concurrent transactions, making one of the 2 transaction fail at commit time with a UniqueViolation error. But since rows in a `directory_entry_xxx` table consist only on the triplet `(target, name, perms)` and we run the db in read committed isolation level, when the next query (filling the `tmp_directory` table) in `swh_directory_entry_add()` sql function is executed, the insertion of conflicting rows from other transactions has been committed and is now visible in this transaction, so these conflicts can be simply ignored. Upgrade db version to 190.
-
- Sep 05, 2023
-
-
David Douard authored
Hypothesis is not happy that a few hypothesis-given tests are declated in storage_tests but actually used in several derived test cases (test_postgresql, test_cassandra, etc.) and raises an error pointing to https://hypothesis.readthedocs.io/en/latest/settings.html#hypothesis.HealthCheck.differing_executors This is a dirty solution hiding the effects of the original issue rather than properly fixing the root cause. Move the definition of disabled health checks in each test file rather than in `conftest.py`, since this new health check to disable is not required in algos/test_snapshot.py
-
- Sep 04, 2023
-
-
Antoine Lambert authored
Truncate a keyspace table only if it is not empty when executing the teardown phase of the swh_storage_cassandra_backend_config function scope fixture. This brings a two times speedup when executing all cassandra related tests.
-
Antoine Lambert authored
Bump it from 2 to 30 seconds in order to fix flaky tests on Jenkins.
-
- Sep 01, 2023
-
-
Jayesh authored
- Add a simple query builder for dynamic SQL queries - Add a method to make pagination clause and logic consistent - Refactor 'origin_visit_get_range' using the builder
-
- Aug 31, 2023
-
-
Antoine Lambert authored
-
Antoine Lambert authored
That test is flaky because of hypothesis deadlines so turn them off.
-
- Aug 30, 2023
-
-
David Douard authored
-
David Douard authored
These may pollute the logs quite a bit in a replayer setup, and the full tb is not useful.
-
- Aug 29, 2023
-
- Aug 21, 2023
-
-
Jérémy Bobbio (Lunar) authored
To create recovery bundle, `swh-alter` needs to be able to retrieve full SkippedContent objects from the storage. The new method `skipped_content_find()` allows to retrieve all SkippedContent objects matching a given set of hashes. Usually, this should only be one, but multiple objects might be returned in case of hash collisions. While implementing this, #4693 was identified which prevent testing the implementation with the PostgreSQL storage in the case a SkippedContent references a known origin. Thanks to olasd and vlorentz for the reviews and suggesting small improvements.
-
- Aug 10, 2023
-
-
Antoine R. Dumont authored
This seems to be the root cause of the issue in the debian build. Refs. swh/infra/sysadm-environment#4525
-
Antoine R. Dumont authored
-
- Aug 09, 2023
-
-
Antoine R. Dumont authored
Only keep what's not dvcs tracked. If anything, having extra files declared without it being used is confusing.
-
Antoine R. Dumont authored
Currently, latest setuptools release makes the `find_packages` warns when building the distribution archive. It mentions that some included folders sql, sql/upgrade, etc... are imported but not explicitly declared as packages [1] [2]. As we only want the swh.* module, explicitely mentions it. According to the setuptools documentation [2], this now builds the archive as we want it (as before including the swh.storage.{sql,tests,...} folders). [1] ``` Python recognizes 'swh.storage.sql' as an importable package[^1], 10:32:14 but it is absent from setuptools' `packages` configuration. ... ``` [2] https://jenkins.softwareheritage.org/view/swh-debian%20(draft)/job/debian/job/packages/job/DSTO/job/gbp-buildpackage/468/console [3] https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#finding-namespace-packages Refs. swh/infra/sysadm-environment#4997
-
- Aug 08, 2023
- Jul 07, 2023
-
-
David Douard authored
When unset, defaults to a NoopObjstorage. This can be useful for testing, and it allows to use the `swh db init storage` without a configuration file. Warn the user if the storage instance actually use this noop objstorage.
-
David Douard authored
It now needs types-click which is indeed a dependency of swh.core[testing].
-
- Jul 06, 2023
-
-
vlorentz authored
-
- Jun 21, 2023
-
-
Nicolas Dandrimont authored
This allows creating the object references partitions even on storages that are wrapped in any number of proxies, which is the case for both production and staging (they're using the reference recording proxy).
-
- Jun 19, 2023
-
-
Vincent Sellier authored
Related to swh/infra/sysadm-environment#4811
-
- Jun 12, 2023
-
-
Nicolas Dandrimont authored
The previous implementation would prevent the insertion of object_references on Sundays.
-
- Jun 06, 2023
-
-
Nicolas Dandrimont authored
Some circumstances generate duplicate entries that we should catch.
-
Nicolas Dandrimont authored
Our tests were incomplete, and didn't catch that the postgresql backend wasn't able to store references from origins to snapshots (as there is no `origin` object type in the postgresql schema)
-
- Jun 05, 2023
-
-
Nicolas Dandrimont authored
Click 7.0 (which we use on buster) doesn't autogenerate subcommand names, at least under some circumstances, so be explicit about them.
-