Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • vlorentz/swh-scrubber
  • lunar/swh-scrubber
  • anlambert/swh-scrubber
  • swh/devel/swh-scrubber
  • olasd/swh-scrubber
  • douardda/swh-scrubber
  • ardumont/swh-scrubber
  • marmoute/swh-scrubber
8 results
Show changes
Commits on Source (4)
Metadata-Version: 2.1
Name: swh.scrubber
Version: 0.0.4
Summary: Software Heritage Datastore Scrubber
Home-page: https://forge.softwareheritage.org/diffusion/swh-scrubber
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-scrubber
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-scrubber/
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Provides-Extra: testing
License-File: LICENSE
License-File: AUTHORS
Software Heritage - Datastore Scrubber
======================================
Tools to periodically checks data integrity in swh-storage and swh-objstorage,
reports errors, and (try to) fix them.
This is a work in progress; some of the components described below do not
exist yet (cassandra storage checker, objstorage checker, recovery, and reinjection)
The Scrubber package is made of the following parts:
Checking
--------
Highly parallel processes continuously read objects from a data store,
compute checksums, and write any failure in a database, along with the data of
the corrupt object.
There is one "checker" for each datastore package: storage (postgresql and cassandra),
journal (kafka), and objstorage.
Recovery
--------
Then, from time to time, jobs go through the list of known corrupt objects,
and try to recover the original objects, through various means:
* Brute-forcing variations until they match their checksum
* Recovering from another data store
* As a last resort, recovering from known origins, if any
Reinjection
-----------
Finally, when an original object is recovered, it is reinjected in the original
data store, replacing the corrupt one.
docs/README.rst
\ No newline at end of file
Software Heritage - Datastore Scrubber
======================================
Tools to periodically checks data integrity in swh-storage and swh-objstorage,
reports errors, and (try to) fix them.
This is a work in progress; some of the components described below do not
exist yet (cassandra storage checker, objstorage checker, recovery, and reinjection)
The Scrubber package is made of the following parts:
Checking
--------
Highly parallel processes continuously read objects from a data store,
compute checksums, and write any failure in a database, along with the data of
the corrupt object.
There is one "checker" for each datastore package: storage (postgresql and cassandra),
journal (kafka), and objstorage.
Recovery
--------
Then, from time to time, jobs go through the list of known corrupt objects,
and try to recover the original objects, through various means:
* Brute-forcing variations until they match their checksum
* Recovering from another data store
* As a last resort, recovering from known origins, if any
Reinjection
-----------
Finally, when an original object is recovered, it is reinjected in the original
data store, replacing the corrupt one.
[flake8]
# E203: whitespaces before ':' <https://github.com/psf/black/issues/315>
# E231: missing whitespace after ','
# E501: line too long, use B950 warning from flake8-bugbear instead
# W503: line break before binary operator <https://github.com/psf/black/issues/52>
select = C,E,F,W,B950
ignore = E203,E231,E501,W503
max-line-length = 88
[egg_info]
tag_build =
tag_date = 0
Metadata-Version: 2.1
Name: swh.scrubber
Version: 0.0.4
Summary: Software Heritage Datastore Scrubber
Home-page: https://forge.softwareheritage.org/diffusion/swh-scrubber
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-scrubber
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-scrubber/
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Provides-Extra: testing
License-File: LICENSE
License-File: AUTHORS
Software Heritage - Datastore Scrubber
======================================
Tools to periodically checks data integrity in swh-storage and swh-objstorage,
reports errors, and (try to) fix them.
This is a work in progress; some of the components described below do not
exist yet (cassandra storage checker, objstorage checker, recovery, and reinjection)
The Scrubber package is made of the following parts:
Checking
--------
Highly parallel processes continuously read objects from a data store,
compute checksums, and write any failure in a database, along with the data of
the corrupt object.
There is one "checker" for each datastore package: storage (postgresql and cassandra),
journal (kafka), and objstorage.
Recovery
--------
Then, from time to time, jobs go through the list of known corrupt objects,
and try to recover the original objects, through various means:
* Brute-forcing variations until they match their checksum
* Recovering from another data store
* As a last resort, recovering from known origins, if any
Reinjection
-----------
Finally, when an original object is recovered, it is reinjected in the original
data store, replacing the corrupt one.
.git-blame-ignore-revs
.gitignore
.pre-commit-config.yaml
AUTHORS
CODE_OF_CONDUCT.md
CONTRIBUTORS
LICENSE
MANIFEST.in
Makefile
README.rst
conftest.py
mypy.ini
pyproject.toml
pytest.ini
requirements-swh.txt
requirements-test.txt
requirements.txt
setup.cfg
setup.py
tox.ini
docs/.gitignore
docs/Makefile
docs/README.rst
docs/conf.py
docs/index.rst
docs/_static/.placeholder
docs/_templates/.placeholder
swh/__init__.py
swh.scrubber.egg-info/PKG-INFO
swh.scrubber.egg-info/SOURCES.txt
swh.scrubber.egg-info/dependency_links.txt
swh.scrubber.egg-info/entry_points.txt
swh.scrubber.egg-info/requires.txt
swh.scrubber.egg-info/top_level.txt
swh/scrubber/__init__.py
swh/scrubber/cli.py
swh/scrubber/db.py
swh/scrubber/fixer.py
swh/scrubber/journal_checker.py
swh/scrubber/origin_locator.py
swh/scrubber/py.typed
swh/scrubber/storage_checker.py
swh/scrubber/utils.py
swh/scrubber/sql/20-enums.sql
swh/scrubber/sql/30-schema.sql
swh/scrubber/sql/60-indexes.sql
swh/scrubber/tests/__init__.py
swh/scrubber/tests/conftest.py
swh/scrubber/tests/test_cli.py
swh/scrubber/tests/test_fixer.py
swh/scrubber/tests/test_init.py
swh/scrubber/tests/test_journal_kafka.py
swh/scrubber/tests/test_origin_locator.py
swh/scrubber/tests/test_storage_postgresql.py
\ No newline at end of file
[swh.cli.subcommands]
scrubber = swh.scrubber.cli
dulwich
swh.core[http]>=0.3
swh.loader.git>=1.4.0
swh.model>=5.0.0
swh.storage>=1.1.0
swh.journal>=0.9.0
[testing]
pytest
pytest-mock
pyyaml
swh.graph
types-pyyaml
swh