- Mar 13, 2024
-
-
Antoine Lambert authored
Remove use of --import-mode=importlib pytest option and use new option consider_namespace_packages to fix tests execution with latest pytest release.
-
- Feb 06, 2024
-
-
David Douard authored
The next partition to check, as returned by the checked_partition_iter_next() iterator should never return a partition number exceeding the max number of partitions in the config, nor should it addd this in the database.
-
- Feb 05, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Feb 02, 2024
-
-
Nicolas Dandrimont authored
-
- Dec 05, 2023
-
-
David Douard authored
-
David Douard authored
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Nov 30, 2023
-
-
David Douard authored
- Nov 24, 2023
-
-
Antoine Lambert authored
It now requires a swh-graph server running or connection errors appear. Use swh-graph NaiveClient to avoid spawning a real graph server during the tests.
-
- Oct 17, 2023
-
-
David Douard authored
As well as a command to list partitions being checked. For example: ``` $ swh scrubber check stats snapshot_16 -j { "config": { "name": "snapshot_16", "datastore": { "package": "storage", "cls": "postgresql", "instance": "postgresql:///?service=swh-storage" }, "object_type": "snapshot", "nb_partitions": 65536, "check_hashes": true, "check_references": true }, "min_duration": 0.002196, "max_duration": 0.107398, "avg_duration": 0.005969, "checked_partition": 65536, "running_partition": 0, "missing_object": 0, "missing_object_reference": 0, "corrupt_object": 0 } $ swh scrubber check running cfg1 Running partitions for cfg1 [id=1, type=snapshot]: 0: running since today (20 minutes) ```
-
- Oct 16, 2023
-
-
David Douard authored
init` command
-
- Oct 12, 2023
-
-
David Douard authored
-
David Douard authored
These flags allow to configure a checking session including only one of the 2 possible checks (hash computation and reference validation).
-
David Douard authored
Which allows to remove the dependency on types-pyyaml in [testing] extra.
-
David Douard authored
These tables used to reference the datastore the invalid/missing object was found in, but not keeping the config entry, i.e. the checking session during wich the invalid/missing object was found, which can be an issue when more than one checking session is executed on a given datastore. This replaces the `datastore` field of tables `corrupt_object`, `missing_object` and `missing_object_reference` tables by `config_id`. Adapt all the code accordingly. Note that it changes a bit the cli usage: the kafka checker now needs a config entry, thus a kafka checking session can ony target a given object type (i.e. one kafka topic), The migration script will fill the config_id column for corrupt_object using the check_config entry that matches the oject_type (of corrupt_object) and datastore. For missing_object and missing_object_reference, it will use this later table to idenify the check_config entry corresponding object type for the reference_id and datastore, since it is a checking session on this object type that will generate a missing object entry (which is generaaly not of the same type). For the missing_object table, the config_id will use the one extracted from the missing_object_reference (joining on the missing_id column). Note that the migration script will fail if there are rows in one of these tables for which there exists more than one possible config_entry (i.e. with the same object_type and datastore).
-
- Sep 21, 2023
-
-
David Douard authored
was missing the flake8-bugbear dependency, making effectively the line-too-long check disabled.
-
- Aug 24, 2023
-
-
Antoine R. Dumont authored
Previously, in production, this would retrieve the configuration of the other backend as those configurations are named the same. Refs. #4696
-
Antoine R. Dumont authored
To avoid returning only the first one when multiple configuration with the same name exists for different backend to scrub. Refs. #4696
-
- Jul 26, 2023
-
-
Antoine R. Dumont authored
It's popping up after having run tests.
-
Antoine R. Dumont authored
This was found while deploying the new version.
-
Antoine R. Dumont authored
With older click version (e.g. 7.0-1), the text wrapping can be different, resulting in some docstring text included in this command list, so check we find the expected commands instead [1] [2] Refs. swh/infra/sysadm-environment#4992 [1] 'defined ...' is part of the first line of the docstring for the "init" subcommand. ``` 10:21:42 E AssertionError: assert ['init', 'defined...', 'journal', 'list', 'stalled', 'storage'] == ['init', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E At index 1 diff: 'defined...' != 'journal' 10:21:42 E Left contains one more item: 'storage' 10:21:42 E Full diff: 10:21:42 E - ['init', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E + ['init', 'defined...', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E ? ++++++++++++++ ``` [2] https://jenkins.softwareheritage.org/view/swh-debian%20(draft)/job/debian/job/packages/job/DSCRUB/job/gbp-buildpackage/31/console
-
- Jul 12, 2023
-
-
David Douard authored
-
- Jul 10, 2023
-
-
David Douard authored
This is needed to make it compatible with swh.core's db upgrade tooling: the name of the configuration section is exptected to be the swh module.
-
David Douard authored
Need to drop the index of the old checked_partition before recreating the new one (with the same name); simplest way of doing this is cascade droping the old checked_partition table before recreating the new index.
-
David Douard authored
This is especially testing the fact the `--help` argument works when running the `swh scrubber check --help` without any configuration file set.
-
David Douard authored
This flag reset the partitions identified as stalled by setting start_date and end_date to NULL. This should put these reset partition to be selected for checking by a scrubber worker.
-
David Douard authored
For a given configuration (hence sotrage, object_type and partition scheme) list partitions that have a start_date but no end_date for a long enough time. By default, it will compute the delay for a partition to be considered as stalled based on the 10 last partitions checked for the given configuration.
-
David Douard authored
A checker configuration must now be created before being able to start a checker session. This configuration is stored in the database and consist in a triplet (datastore, object_type, nb_partitions) Once done, any number of checker can be started for this specific checker configuration; each checher process will check partitions one by one, using the status stored in the database to get the next partition number to check on the next iteration. This allows to dynamically adapt the number of checker processes. For example, checking the shapshots splitting the hash space in 4096 partitions using 4 parallel workers could be like: $ export SWH_CONFIG_FILENAME=config.yml $ swh scrubber check init --object-type snapshot --nb-partitions 4096 --name cfg-snp Created configuration cfg-snp [3] for checking shapshot in postgresql storage $ for i in {1..4}; do (swh scrubber check storage cfg-snp &); done
-
- Jul 07, 2023
-
-
David Douard authored
This new table stores the "configuration" for a scrubber. A configuration consists in a set of: (datastore, object_type, nb_partitions) This comes with a migration script; WARNING: this script needs to be checked before deployment on a productiion-sized big database. Any activity on the database should be stopped before execution. This is the first step of a series to make the scrubber easier to deploy on elastic infrastructure.
-
David Douard authored
It now needs types-click which is indeed a dependency of swh.core[testing].
-
- Jun 21, 2023
-
-
Nicolas Dandrimont authored
This allows overriding the JAVA_HOME to run cassandra with a different java version (which also happens to be needed in CI, as we force usage of an old java for cassandra through that envvar).
-
Nicolas Dandrimont authored
This avoids reinstalling tox all the time
-
- Apr 05, 2023
-
- Mar 28, 2023
-
-
Nicolas Dandrimont authored
-
- Mar 22, 2023
-
- Mar 16, 2023
-
-
vlorentz authored
It makes more sense to query a range of partition ids with a fixed nb_partition than a range of nb_partitions with a fix partition id No migration because the next release will need to scrap the whole table anyway.
-