Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • vlorentz/swh-scrubber
  • lunar/swh-scrubber
  • anlambert/swh-scrubber
  • swh/devel/swh-scrubber
  • olasd/swh-scrubber
  • douardda/swh-scrubber
  • ardumont/swh-scrubber
  • marmoute/swh-scrubber
8 results
Show changes
Commits on Source (14)
Metadata-Version: 2.1
Name: swh.scrubber
Version: 0.0.4
Summary: Software Heritage Datastore Scrubber
Home-page: https://forge.softwareheritage.org/diffusion/swh-scrubber
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-scrubber
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-scrubber/
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Provides-Extra: testing
License-File: LICENSE
License-File: AUTHORS
Software Heritage - Datastore Scrubber
======================================
Tools to periodically checks data integrity in swh-storage and swh-objstorage,
reports errors, and (try to) fix them.
This is a work in progress; some of the components described below do not
exist yet (cassandra storage checker, objstorage checker, recovery, and reinjection)
The Scrubber package is made of the following parts:
Checking
--------
Highly parallel processes continuously read objects from a data store,
compute checksums, and write any failure in a database, along with the data of
the corrupt object.
There is one "checker" for each datastore package: storage (postgresql and cassandra),
journal (kafka), and objstorage.
Recovery
--------
Then, from time to time, jobs go through the list of known corrupt objects,
and try to recover the original objects, through various means:
* Brute-forcing variations until they match their checksum
* Recovering from another data store
* As a last resort, recovering from known origins, if any
Reinjection
-----------
Finally, when an original object is recovered, it is reinjected in the original
data store, replacing the corrupt one.
docs/README.rst
\ No newline at end of file
Software Heritage - Datastore Scrubber
======================================
Tools to periodically checks data integrity in swh-storage and swh-objstorage,
reports errors, and (try to) fix them.
This is a work in progress; some of the components described below do not
exist yet (cassandra storage checker, objstorage checker, recovery, and reinjection)
The Scrubber package is made of the following parts:
Checking
--------
Highly parallel processes continuously read objects from a data store,
compute checksums, and write any failure in a database, along with the data of
the corrupt object.
There is one "checker" for each datastore package: storage (postgresql and cassandra),
journal (kafka), and objstorage.
Recovery
--------
Then, from time to time, jobs go through the list of known corrupt objects,
and try to recover the original objects, through various means:
* Brute-forcing variations until they match their checksum
* Recovering from another data store
* As a last resort, recovering from known origins, if any
Reinjection
-----------
Finally, when an original object is recovered, it is reinjected in the original
data store, replacing the corrupt one.
swh-scrubber (0.0.4-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.4 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2022-05-30 17:35:51
+0200)
* Upstream changes: - v0.0.4 - Recursive include the
swh/scrubber/sql folder
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 30 May 2022 15:40:57 +0000
swh-scrubber (0.0.3-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.3 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2022-05-30 15:45:51
+0200)
* Upstream changes: - v0.0.3 - Unify factory to use keyword
'postgresql' over deprecated 'local' - db: Bump to version 2
- requirements: Add missing dependency
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 30 May 2022 13:49:46 +0000
swh-scrubber (0.0.2-1~swh2) unstable-swh; urgency=medium
* Update dependencies and bump new release
-- Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Mon, 30 May 2022 14:25:52 +0200
swh-scrubber (0.0.2-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.2 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2022-05-30 10:15:15
+0200)
* Upstream changes: - v0.0.2 - Add Fixer class, which re-loads
corrupt objects from origins - Fix crash when using
datastore_get_or_add for an existing datastore - Internals -
-------- - requirements-test: Remove pytest pinning to < 7 -
add strict asyncio_mode in pytest.ini - Bump mypy to v0.942 -
Add .git-blame-ignore-revs file with automatic reformatting commits
- python: Reformat code with black 22.3.0 - pre-commit, tox:
Bump black from 19.10b0 to 22.3.0
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 30 May 2022 08:19:10 +0000
swh-scrubber (0.0.1-1~swh1) unstable-swh; urgency=medium
* Initial release
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 31 Mar 2022 19:29:54 +0200
Source: swh-scrubber
Maintainer: Software Heritage developers <swh-devel@inria.fr>
Section: python
Priority: optional
Build-Depends:
debhelper-compat (= 13),
dh-python (>= 3),
python3-all,
python3-dulwich,
python3-pytest,
python3-pytest-mock,
python3-pytest-postgresql,
python3-setuptools,
python3-setuptools-scm,
python3-swh.core (>= 0.3),
python3-swh.core.db.pytestplugin,
python3-swh.graph.client,
python3-swh.journal (>= 0.9.0),
python3-swh.loader.git (>= 1.4.0),
python3-swh.model (>= 5.0.0),
python3-swh.storage (>= 1.1.0),
python3-yaml,
git,
Rules-Requires-Root: no
Standards-Version: 4.6.0
Homepage: https://forge.softwareheritage.org/source/swh-scrubber
Package: python3-swh.scrubber
Architecture: all
Depends: ${misc:Depends}, ${python3:Depends},
Description: Software Heritage Datastore Scrubber
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Files: *
Copyright: 2015-2022 The Software Heritage developers
License: GPL-3+
License: GPL-3+
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
.
On Debian systems, the complete text of the GNU General Public
License version 3 can be found in `/usr/share/common-licenses/GPL-3'.
[DEFAULT]
upstream-branch=debian/upstream
upstream-tag=debian/upstream/%(version)s
upstream-vcs-tag=v%(version)s
debian-branch=debian/unstable-swh
pristine-tar=True
#!/usr/bin/make -f
export PYBUILD_NAME=swh.scrubber
export PYBUILD_TEST_ARGS=-vv
%:
dh $@ --with python3 --buildsystem=pybuild
override_dh_install:
dh_install
rm -v $(CURDIR)/debian/python3-*/usr/lib/python*/dist-packages/swh/__init__.py
3.0 (quilt)
[flake8]
# E203: whitespaces before ':' <https://github.com/psf/black/issues/315>
# E231: missing whitespace after ','
# E501: line too long, use B950 warning from flake8-bugbear instead
# W503: line break before binary operator <https://github.com/psf/black/issues/52>
select = C,E,F,W,B950
ignore = E203,E231,E501,W503
max-line-length = 88
[egg_info]
tag_build =
tag_date = 0
Metadata-Version: 2.1
Name: swh.scrubber
Version: 0.0.4
Summary: Software Heritage Datastore Scrubber
Home-page: https://forge.softwareheritage.org/diffusion/swh-scrubber
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-scrubber
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-scrubber/
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Provides-Extra: testing
License-File: LICENSE
License-File: AUTHORS
Software Heritage - Datastore Scrubber
======================================
Tools to periodically checks data integrity in swh-storage and swh-objstorage,
reports errors, and (try to) fix them.
This is a work in progress; some of the components described below do not
exist yet (cassandra storage checker, objstorage checker, recovery, and reinjection)
The Scrubber package is made of the following parts:
Checking
--------
Highly parallel processes continuously read objects from a data store,
compute checksums, and write any failure in a database, along with the data of
the corrupt object.
There is one "checker" for each datastore package: storage (postgresql and cassandra),
journal (kafka), and objstorage.
Recovery
--------
Then, from time to time, jobs go through the list of known corrupt objects,
and try to recover the original objects, through various means:
* Brute-forcing variations until they match their checksum
* Recovering from another data store
* As a last resort, recovering from known origins, if any
Reinjection
-----------
Finally, when an original object is recovered, it is reinjected in the original
data store, replacing the corrupt one.
.git-blame-ignore-revs
.gitignore
.pre-commit-config.yaml
AUTHORS
CODE_OF_CONDUCT.md
CONTRIBUTORS
LICENSE
MANIFEST.in
Makefile
README.rst
conftest.py
mypy.ini
pyproject.toml
pytest.ini
requirements-swh.txt
requirements-test.txt
requirements.txt
setup.cfg
setup.py
tox.ini
docs/.gitignore
docs/Makefile
docs/README.rst
docs/conf.py
docs/index.rst
docs/_static/.placeholder
docs/_templates/.placeholder
swh/__init__.py
swh.scrubber.egg-info/PKG-INFO
swh.scrubber.egg-info/SOURCES.txt
swh.scrubber.egg-info/dependency_links.txt
swh.scrubber.egg-info/entry_points.txt
swh.scrubber.egg-info/requires.txt
swh.scrubber.egg-info/top_level.txt
swh/scrubber/__init__.py
swh/scrubber/cli.py
swh/scrubber/db.py
swh/scrubber/fixer.py
swh/scrubber/journal_checker.py
swh/scrubber/origin_locator.py
swh/scrubber/py.typed
swh/scrubber/storage_checker.py
swh/scrubber/utils.py
swh/scrubber/sql/20-enums.sql
swh/scrubber/sql/30-schema.sql
swh/scrubber/sql/60-indexes.sql
swh/scrubber/tests/__init__.py
swh/scrubber/tests/conftest.py
swh/scrubber/tests/test_cli.py
swh/scrubber/tests/test_fixer.py
swh/scrubber/tests/test_init.py
swh/scrubber/tests/test_journal_kafka.py
swh/scrubber/tests/test_origin_locator.py
swh/scrubber/tests/test_storage_postgresql.py
\ No newline at end of file
[swh.cli.subcommands]
scrubber = swh.scrubber.cli
dulwich
swh.core[http]>=0.3
swh.loader.git>=1.4.0
swh.model>=5.0.0
swh.storage>=1.1.0
swh.journal>=0.9.0
[testing]
pytest
pytest-mock
pyyaml
swh.graph
types-pyyaml
swh