- Apr 11, 2024
-
-
Antoine Lambert authored
-
Antoine Lambert authored
-
- Apr 04, 2024
-
-
David Douard authored
Deprecate the former ones.
-
- Apr 02, 2024
-
-
The default behavior of Click is to rewrap text based on the width of the terminal but as a consequence it makes the sample YAML config for the scrubber displayed in command help quite unreadable as indentation is lost. So use \b markers in docstring to ensure proper display of the YAML config by Click.
-
- Mar 29, 2024
-
-
David Douard authored
-
- Mar 25, 2024
-
-
Antoine Lambert authored
Add class ObjectStorageChecker to detect missing and corrupted contents in an object storage. It iterates on content objects referenced in a storage instance, check they are available in a given object storage instance then retrieve their bytes from it in order to recompute checksums and detect corruptions. Related to #4694.
-
Antoine Lambert authored
Promote use of on_eof parameter instead.
-
- Mar 22, 2024
-
-
Antoine Lambert authored
To simplify the future adding of an objstorage checker, extract common code and features of current checkers in abstract base classes. Related to #4694.
-
- Mar 13, 2024
-
-
Antoine Lambert authored
Remove use of --import-mode=importlib pytest option and use new option consider_namespace_packages to fix tests execution with latest pytest release.
-
- Feb 06, 2024
-
-
David Douard authored
The next partition to check, as returned by the checked_partition_iter_next() iterator should never return a partition number exceeding the max number of partitions in the config, nor should it addd this in the database.
-
- Feb 05, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Feb 02, 2024
-
-
Nicolas Dandrimont authored
-
- Dec 05, 2023
-
-
David Douard authored
-
David Douard authored
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Nov 30, 2023
-
-
David Douard authored
- Nov 24, 2023
-
-
Antoine Lambert authored
It now requires a swh-graph server running or connection errors appear. Use swh-graph NaiveClient to avoid spawning a real graph server during the tests.
-
- Oct 17, 2023
-
-
David Douard authored
As well as a command to list partitions being checked. For example: ``` $ swh scrubber check stats snapshot_16 -j { "config": { "name": "snapshot_16", "datastore": { "package": "storage", "cls": "postgresql", "instance": "postgresql:///?service=swh-storage" }, "object_type": "snapshot", "nb_partitions": 65536, "check_hashes": true, "check_references": true }, "min_duration": 0.002196, "max_duration": 0.107398, "avg_duration": 0.005969, "checked_partition": 65536, "running_partition": 0, "missing_object": 0, "missing_object_reference": 0, "corrupt_object": 0 } $ swh scrubber check running cfg1 Running partitions for cfg1 [id=1, type=snapshot]: 0: running since today (20 minutes) ```
-
- Oct 16, 2023
-
-
David Douard authored
init` command
-
- Oct 12, 2023
-
-
David Douard authored
-
David Douard authored
These flags allow to configure a checking session including only one of the 2 possible checks (hash computation and reference validation).
-
David Douard authored
Which allows to remove the dependency on types-pyyaml in [testing] extra.
-
David Douard authored
These tables used to reference the datastore the invalid/missing object was found in, but not keeping the config entry, i.e. the checking session during wich the invalid/missing object was found, which can be an issue when more than one checking session is executed on a given datastore. This replaces the `datastore` field of tables `corrupt_object`, `missing_object` and `missing_object_reference` tables by `config_id`. Adapt all the code accordingly. Note that it changes a bit the cli usage: the kafka checker now needs a config entry, thus a kafka checking session can ony target a given object type (i.e. one kafka topic), The migration script will fill the config_id column for corrupt_object using the check_config entry that matches the oject_type (of corrupt_object) and datastore. For missing_object and missing_object_reference, it will use this later table to idenify the check_config entry corresponding object type for the reference_id and datastore, since it is a checking session on this object type that will generate a missing object entry (which is generaaly not of the same type). For the missing_object table, the config_id will use the one extracted from the missing_object_reference (joining on the missing_id column). Note that the migration script will fail if there are rows in one of these tables for which there exists more than one possible config_entry (i.e. with the same object_type and datastore).
-
- Sep 21, 2023
-
-
David Douard authored
was missing the flake8-bugbear dependency, making effectively the line-too-long check disabled.
-
- Aug 24, 2023
-
-
Antoine R. Dumont authored
Previously, in production, this would retrieve the configuration of the other backend as those configurations are named the same. Refs. #4696
-
Antoine R. Dumont authored
To avoid returning only the first one when multiple configuration with the same name exists for different backend to scrub. Refs. #4696
-
- Jul 26, 2023
-
-
Antoine R. Dumont authored
It's popping up after having run tests.
-
Antoine R. Dumont authored
This was found while deploying the new version.
-
Antoine R. Dumont authored
With older click version (e.g. 7.0-1), the text wrapping can be different, resulting in some docstring text included in this command list, so check we find the expected commands instead [1] [2] Refs. swh/infra/sysadm-environment#4992 [1] 'defined ...' is part of the first line of the docstring for the "init" subcommand. ``` 10:21:42 E AssertionError: assert ['init', 'defined...', 'journal', 'list', 'stalled', 'storage'] == ['init', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E At index 1 diff: 'defined...' != 'journal' 10:21:42 E Left contains one more item: 'storage' 10:21:42 E Full diff: 10:21:42 E - ['init', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E + ['init', 'defined...', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E ? ++++++++++++++ ``` [2] https://jenkins.softwareheritage.org/view/swh-debian%20(draft)/job/debian/job/packages/job/DSCRUB/job/gbp-buildpackage/31/console
-
- Jul 12, 2023
-
-
David Douard authored
-
- Jul 10, 2023
-
-
David Douard authored
This is needed to make it compatible with swh.core's db upgrade tooling: the name of the configuration section is exptected to be the swh module.
-
David Douard authored
Need to drop the index of the old checked_partition before recreating the new one (with the same name); simplest way of doing this is cascade droping the old checked_partition table before recreating the new index.
-
David Douard authored
This is especially testing the fact the `--help` argument works when running the `swh scrubber check --help` without any configuration file set.
-
David Douard authored
This flag reset the partitions identified as stalled by setting start_date and end_date to NULL. This should put these reset partition to be selected for checking by a scrubber worker.
-
David Douard authored
For a given configuration (hence sotrage, object_type and partition scheme) list partitions that have a start_date but no end_date for a long enough time. By default, it will compute the delay for a partition to be considered as stalled based on the 10 last partitions checked for the given configuration.
-
David Douard authored
A checker configuration must now be created before being able to start a checker session. This configuration is stored in the database and consist in a triplet (datastore, object_type, nb_partitions) Once done, any number of checker can be started for this specific checker configuration; each checher process will check partitions one by one, using the status stored in the database to get the next partition number to check on the next iteration. This allows to dynamically adapt the number of checker processes. For example, checking the shapshots splitting the hash space in 4096 partitions using 4 parallel workers could be like: $ export SWH_CONFIG_FILENAME=config.yml $ swh scrubber check init --object-type snapshot --nb-partitions 4096 --name cfg-snp Created configuration cfg-snp [3] for checking shapshot in postgresql storage $ for i in {1..4}; do (swh scrubber check storage cfg-snp &); done
-
- Jul 07, 2023
-
-
David Douard authored
This new table stores the "configuration" for a scrubber. A configuration consists in a set of: (datastore, object_type, nb_partitions) This comes with a migration script; WARNING: this script needs to be checked before deployment on a productiion-sized big database. Any activity on the database should be stopped before execution. This is the first step of a series to make the scrubber easier to deploy on elastic infrastructure.
-
David Douard authored
It now needs types-click which is indeed a dependency of swh.core[testing].
-