Commits on Source (97)
-
Antoine R. Dumont authored
Related to T4228
-
Antoine R. Dumont authored
Related to T4228
-
Antoine R. Dumont authored
We migrated away from "local" a while back. Related to T4228
-
Antoine R. Dumont authored
so migration tool can be used Related to T4284
-
Antoine R. Dumont authored
Related to T4228
-
Antoine R. Dumont authored
This alleviates current locking issues where nothing gets written in production. Related to T4228
-
vlorentz authoredd1cbd624
-
David Douard authored
instead of (soon-to-be-deprecated) swh-core's postgresql_fact one.
3d999c51 -
vlorentz authored708b2b2c
-
Nicolas Dandrimont authoredc6f711ec
-
vlorentz authored
This will allow adding more tags easily in future commits
0cc86b0e -
vlorentz authored
It will probably be useful to know what part of the check takes the most time.
aaae867e -
vlorentz authored53830b24
-
vlorentz authored36d16bcd
-
Antoine Lambert authored630001ce
-
vlorentz authored
It will be used by the storage_checker to 'remember' what ranges it already checked recently across runs (and crashes), and to monitor progress.
fef8a513 -
vlorentz authored
For now, this does not use this information to deduplicate work.
84fa17c0 -
vlorentz authored
For now this is a naive implementation, which does never rechecks.
28224287 -
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
afacc488 -
Antoine Lambert authored17cf8b66
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Related to T4496
2347dd72 -
vlorentz authored
This should avoid pointless reports to Sentry and Icinga on AdminShutdown exceptions.
a89cd673 -
vlorentz authored
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
Antoine Lambert authored
Related to swh/meta#4960
-
Jérémy Bobbio (Lunar) authored
GitLab will display the content of the README file when browsing the repository. But in case the file is a symlink, it will display the path pointed by the symlink. There is a 6 year old issue about this: https://gitlab.com/gitlab-org/gitlab/-/issues/15093 We can workaround the issue by having the content at the root of the repository and a symlink to this file in the `docs/` directory. Tested in swh/devel/swh-py-template!27
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
It makes more sense to query a range of partition ids with a fixed nb_partition than a range of nb_partitions with a fix partition id No migration because the next release will need to scrap the whole table anyway.
-
vlorentz authored
-
vlorentz authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
This avoids reinstalling tox all the time
4e5334d7 -
Nicolas Dandrimont authored
This allows overriding the JAVA_HOME to run cassandra with a different java version (which also happens to be needed in CI, as we force usage of an old java for cassandra through that envvar).
-
David Douard authored
It now needs types-click which is indeed a dependency of swh.core[testing].
-
David Douard authored
This new table stores the "configuration" for a scrubber. A configuration consists in a set of: (datastore, object_type, nb_partitions) This comes with a migration script; WARNING: this script needs to be checked before deployment on a productiion-sized big database. Any activity on the database should be stopped before execution. This is the first step of a series to make the scrubber easier to deploy on elastic infrastructure.
369341bc -
David Douard authored
A checker configuration must now be created before being able to start a checker session. This configuration is stored in the database and consist in a triplet (datastore, object_type, nb_partitions) Once done, any number of checker can be started for this specific checker configuration; each checher process will check partitions one by one, using the status stored in the database to get the next partition number to check on the next iteration. This allows to dynamically adapt the number of checker processes. For example, checking the shapshots splitting the hash space in 4096 partitions using 4 parallel workers could be like: $ export SWH_CONFIG_FILENAME=config.yml $ swh scrubber check init --object-type snapshot --nb-partitions 4096 --name cfg-snp Created configuration cfg-snp [3] for checking shapshot in postgresql storage $ for i in {1..4}; do (swh scrubber check storage cfg-snp &); done
9cd7414a -
David Douard authored
For a given configuration (hence sotrage, object_type and partition scheme) list partitions that have a start_date but no end_date for a long enough time. By default, it will compute the delay for a partition to be considered as stalled based on the 10 last partitions checked for the given configuration.
67a743d0 -
David Douard authored
This flag reset the partitions identified as stalled by setting start_date and end_date to NULL. This should put these reset partition to be selected for checking by a scrubber worker.
87412380 -
David Douard authored
This is especially testing the fact the `--help` argument works when running the `swh scrubber check --help` without any configuration file set.
-
David Douard authored
Need to drop the index of the old checked_partition before recreating the new one (with the same name); simplest way of doing this is cascade droping the old checked_partition table before recreating the new index.
d9f89378 -
David Douard authored
This is needed to make it compatible with swh.core's db upgrade tooling: the name of the configuration section is exptected to be the swh module.
-
David Douard authored
-
Antoine R. Dumont authored
With older click version (e.g. 7.0-1), the text wrapping can be different, resulting in some docstring text included in this command list, so check we find the expected commands instead [1] [2] Refs. swh/infra/sysadm-environment#4992 [1] 'defined ...' is part of the first line of the docstring for the "init" subcommand. ``` 10:21:42 E AssertionError: assert ['init', 'defined...', 'journal', 'list', 'stalled', 'storage'] == ['init', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E At index 1 diff: 'defined...' != 'journal' 10:21:42 E Left contains one more item: 'storage' 10:21:42 E Full diff: 10:21:42 E - ['init', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E + ['init', 'defined...', 'journal', 'list', 'stalled', 'storage'] 10:21:42 E ? ++++++++++++++ ``` [2] https://jenkins.softwareheritage.org/view/swh-debian%20(draft)/job/debian/job/packages/job/DSCRUB/job/gbp-buildpackage/31/console
-
Antoine R. Dumont authored
This was found while deploying the new version.
-
Antoine R. Dumont authored
It's popping up after having run tests.
-
Antoine R. Dumont authored
To avoid returning only the first one when multiple configuration with the same name exists for different backend to scrub. Refs. #4696
-
Antoine R. Dumont authored
Previously, in production, this would retrieve the configuration of the other backend as those configurations are named the same. Refs. #4696
-
David Douard authored
was missing the flake8-bugbear dependency, making effectively the line-too-long check disabled.
-
David Douard authored
These tables used to reference the datastore the invalid/missing object was found in, but not keeping the config entry, i.e. the checking session during wich the invalid/missing object was found, which can be an issue when more than one checking session is executed on a given datastore. This replaces the `datastore` field of tables `corrupt_object`, `missing_object` and `missing_object_reference` tables by `config_id`. Adapt all the code accordingly. Note that it changes a bit the cli usage: the kafka checker now needs a config entry, thus a kafka checking session can ony target a given object type (i.e. one kafka topic), The migration script will fill the config_id column for corrupt_object using the check_config entry that matches the oject_type (of corrupt_object) and datastore. For missing_object and missing_object_reference, it will use this later table to idenify the check_config entry corresponding object type for the reference_id and datastore, since it is a checking session on this object type that will generate a missing object entry (which is generaaly not of the same type). For the missing_object table, the config_id will use the one extracted from the missing_object_reference (joining on the missing_id column). Note that the migration script will fail if there are rows in one of these tables for which there exists more than one possible config_entry (i.e. with the same object_type and datastore).
-
David Douard authored
Which allows to remove the dependency on types-pyyaml in [testing] extra.
-
David Douard authored
These flags allow to configure a checking session including only one of the 2 possible checks (hash computation and reference validation).
566db2ac -
David Douard authored
-
David Douard authored
init` command
-
David Douard authored
As well as a command to list partitions being checked. For example: ``` $ swh scrubber check stats snapshot_16 -j { "config": { "name": "snapshot_16", "datastore": { "package": "storage", "cls": "postgresql", "instance": "postgresql:///?service=swh-storage" }, "object_type": "snapshot", "nb_partitions": 65536, "check_hashes": true, "check_references": true }, "min_duration": 0.002196, "max_duration": 0.107398, "avg_duration": 0.005969, "checked_partition": 65536, "running_partition": 0, "missing_object": 0, "missing_object_reference": 0, "corrupt_object": 0 } $ swh scrubber check running cfg1 Running partitions for cfg1 [id=1, type=snapshot]: 0: running since today (20 minutes) ```
-
Antoine Lambert authored
It now requires a swh-graph server running or connection errors appear. Use swh-graph NaiveClient to avoid spawning a real graph server during the tests.
-
David Douard authored
-
David Douard authored
-
David Douard authored986cef51
-
David Douard authored
-
David Douard authored
-
Nicolas Dandrimont authored
-
Antoine Lambert authored
Related to swh/meta#5075.
-
David Douard authored
The next partition to check, as returned by the checked_partition_iter_next() iterator should never return a partition number exceeding the max number of partitions in the config, nor should it addd this in the database.
-
Antoine Lambert authored
Remove use of --import-mode=importlib pytest option and use new option consider_namespace_packages to fix tests execution with latest pytest release.
-
Antoine Lambert authored
To simplify the future adding of an objstorage checker, extract common code and features of current checkers in abstract base classes. Related to #4694.
0a47e100 -
Antoine Lambert authored
Promote use of on_eof parameter instead.
4e834396 -
Antoine Lambert authored
Add class ObjectStorageChecker to detect missing and corrupted contents in an object storage. It iterates on content objects referenced in a storage instance, check they are available in a given object storage instance then retrieve their bytes from it in order to recompute checksums and detect corruptions. Related to #4694.
-
David Douard authored
-
The default behavior of Click is to rewrap text based on the width of the terminal but as a consequence it makes the sample YAML config for the scrubber displayed in command help quite unreadable as indentation is lost. So use \b markers in docstring to ensure proper display of the YAML config by Click.
-
David Douard authored
Deprecate the former ones.
-
Antoine Lambert authored64bbe6bb
-
Antoine Lambert authoredd8c04aaa
-
Antoine Lambert authored5573b1b9
-
Antoine Lambert authored
Add ObjectStorageCheckerFromJournal class to consume content ids from a kafka topic in order to check their presence in a given object storage but also to check their integrity by fetching their bytes and recomputing checksums. Related to #4694.
a4045264 -
Antoine Lambert authored
Instead of reinventing the wheel, prefer to use the check method from the object storage interface for verifying content presence and integrity. Related to #4694.
68d754ae -
Antoine Lambert authored
Enable to configure and trigger the scrubbing of an object storage with swh-scrubber CLI, either using partitions of contents provided by a storage or by consuming the content kafka topic from a SWH journal (in that case, the --use-journal flag must be provided to "swh check run" command). Related to #4694.
cce4e144 -
Antoine Lambert authored
Promote use of `swh scrubber check run` instead of deprecated commands. Add sections about object storage checker and journal checker.
-
Pierre-Yves David authored1dcb2e88
-
Pierre-Yves David authored
-
David Douard authored
-
Antoine Lambert authored177d2a5b
-
Antoine Lambert authored
-
Antoine Lambert authored
Ensure only partition whose scrubbing is running are returned. Fixes #4704.
-
David Douard authored
Show the actual traceback that occured in the cli command, if any.
0bb57970 -
David Douard authored
-
David Douard authored
Update the scrubber db for better `swh db` compatibility and update test_migration for swh.core 3.6 (acually 3.6.1 is needed).
a6e90620 -
David Douard authored
It has been deprecated for ages now.
-
Antoine Lambert authored
-
Antoine Lambert authored
-
Antoine Lambert authored
-
Nicolas Dandrimont authored
The dict form of add_batch argument is being removed as it relies on a single-hash object identifier.
Showing
- .copier-answers.yml 11 additions, 0 deletions.copier-answers.yml
- .git-blame-ignore-revs 2 additions, 1 deletion.git-blame-ignore-revs
- .gitignore 7 additions, 5 deletions.gitignore
- .pre-commit-config.yaml 51 additions, 42 deletions.pre-commit-config.yaml
- CODE_OF_CONDUCT.md 1 addition, 1 deletionCODE_OF_CONDUCT.md
- MANIFEST.in 0 additions, 5 deletionsMANIFEST.in
- README.rst 220 additions, 1 deletionREADME.rst
- README.rst 220 additions, 1 deletionREADME.rst
- conftest.py 5 additions, 1 deletionconftest.py
- docs/Makefile 1 addition, 1 deletiondocs/Makefile
- docs/README.rst 1 addition, 39 deletionsdocs/README.rst
- docs/README.rst 1 addition, 39 deletionsdocs/README.rst
- docs/cli.rst 9 additions, 0 deletionsdocs/cli.rst
- docs/index.rst 9 additions, 5 deletionsdocs/index.rst
- mypy.ini 0 additions, 22 deletionsmypy.ini
- pyproject.toml 85 additions, 1 deletionpyproject.toml
- pytest.ini 0 additions, 4 deletionspytest.ini
- requirements-swh.txt 5 additions, 4 deletionsrequirements-swh.txt
- requirements-test.txt 4 additions, 2 deletionsrequirements-test.txt
- requirements.txt 4 additions, 0 deletionsrequirements.txt
.copier-answers.yml
0 → 100644
MANIFEST.in
deleted
100644 → 0
README.rst
deleted
120000 → 0
README.rst
0 → 100644
docs/README.rst
deleted
100644 → 0
docs/README.rst
0 → 120000
docs/cli.rst
0 → 100644
mypy.ini
deleted
100644 → 0
pytest.ini
deleted
100644 → 0