Skip to content
Snippets Groups Projects
  1. Apr 11, 2024
  2. Apr 04, 2024
  3. Apr 02, 2024
    • Antoine Lambert's avatar
      cli: Prevent YAML wrapping in help displayed by Click · e8b15c82
      Antoine Lambert authored and Antoine Lambert's avatar Antoine Lambert committed
      The default behavior of Click is to rewrap text based on the width of
      the terminal but as a consequence it makes the sample YAML config for
      the scrubber displayed in command help quite unreadable as indentation
      is lost.
      
      So use \b markers in docstring to ensure proper display of the YAML
      config by Click.
      e8b15c82
  4. Mar 29, 2024
  5. Mar 25, 2024
  6. Mar 22, 2024
  7. Mar 13, 2024
  8. Feb 06, 2024
  9. Feb 05, 2024
  10. Feb 02, 2024
  11. Dec 05, 2023
  12. Dec 03, 2023
  13. Nov 30, 2023
  14. Nov 24, 2023
  15. Oct 17, 2023
    • David Douard's avatar
      Add a cli command to get statistics for a given config entry · c0a4d44b
      David Douard authored
      As well as a command to list partitions being checked.
      
      For example:
      
      ```
      $ swh scrubber check stats snapshot_16 -j
      {
        "config": {
          "name": "snapshot_16",
          "datastore": {
            "package": "storage",
            "cls": "postgresql",
            "instance": "postgresql:///?service=swh-storage"
          },
          "object_type": "snapshot",
          "nb_partitions": 65536,
          "check_hashes": true,
          "check_references": true
        },
        "min_duration": 0.002196,
        "max_duration": 0.107398,
        "avg_duration": 0.005969,
        "checked_partition": 65536,
        "running_partition": 0,
        "missing_object": 0,
        "missing_object_reference": 0,
        "corrupt_object": 0
      }
      
      $ swh scrubber check running cfg1
      
      Running partitions for cfg1 [id=1, type=snapshot]:
      0:	running since today (20 minutes)
      
      ```
      c0a4d44b
  16. Oct 16, 2023
  17. Oct 12, 2023
    • David Douard's avatar
    • David Douard's avatar
      Add support for 2 check config flags in the check_config table · 566db2ac
      David Douard authored
      These flags allow to configure a checking session including only one of
      the 2 possible checks (hash computation and reference validation).
      566db2ac
    • David Douard's avatar
      run mypy with --install-types by default · c08c10b1
      David Douard authored
      Which allows to remove the dependency on types-pyyaml in [testing]
      extra.
      c08c10b1
    • David Douard's avatar
      Refactor data model to use config_id instead of datastore in xxx_object tables · bd8e324c
      David Douard authored
      These tables used to reference the datastore the invalid/missing object
      was found in, but not keeping the config entry, i.e. the checking session
      during wich the invalid/missing object was found, which can be an issue
      when more than one checking session is executed on a given datastore.
      
      This replaces the `datastore` field of tables `corrupt_object`,
      `missing_object` and `missing_object_reference` tables by `config_id`.
      
      Adapt all the code accordingly.
      
      Note that it changes a bit the cli usage: the kafka checker now needs
      a config entry, thus a kafka checking session can ony target a given
      object type (i.e. one kafka topic),
      
      The migration script will fill the config_id column for corrupt_object
      using the check_config entry that matches the oject_type (of
      corrupt_object) and datastore. For missing_object and
      missing_object_reference, it will use this later table to idenify the
      check_config entry corresponding object type for the reference_id and
      datastore, since it is a checking session on this object type that will
      generate a missing object entry (which is generaaly not of the same
      type). For the missing_object table, the config_id will use the one
      extracted from the missing_object_reference (joining on the missing_id
      column).
      
      Note that the migration script will fail if there are rows in one
      of these tables for which there exists more than one possible
      config_entry (i.e. with the same object_type and datastore).
      bd8e324c
  18. Sep 21, 2023
  19. Aug 24, 2023
  20. Jul 26, 2023
  21. Jul 12, 2023
  22. Jul 10, 2023
    • David Douard's avatar
      Rename the 'scrubber_db' config section as 'scrubber' · e879bd14
      David Douard authored
      This is needed to make it compatible with swh.core's db upgrade tooling:
      the name of the configuration section is exptected to be the swh module.
      e879bd14
    • David Douard's avatar
      fix the 5->6 upgrade sql script · d9f89378
      David Douard authored
      Need to drop the index of the old checked_partition before recreating
      the new one (with the same name); simplest way of doing this is cascade
      droping the old checked_partition table before recreating the new index.
      d9f89378
    • David Douard's avatar
      Add a couple of tests in test_cli · 140a935e
      David Douard authored
      This is especially testing the fact the `--help` argument works when
      running the `swh scrubber check --help` without any configuration file
      set.
      140a935e
    • David Douard's avatar
      Add a `--reset` flag to the `swh scrubber check stalled` command · 87412380
      David Douard authored
      This flag reset the partitions identified as stalled by setting
      start_date and end_date to NULL.
      
      This should put these reset partition to be selected for checking by a
      scrubber worker.
      87412380
    • David Douard's avatar
      Add a 'swh scrubber check stalled` command listing stalled partitions · 67a743d0
      David Douard authored
      For a given configuration (hence sotrage, object_type and partition scheme)
      list partitions that have a start_date but no end_date for a long enough
      time.
      
      By default, it will compute the delay for a partition to be considered as
      stalled based on the 10 last partitions checked for the given
      configuration.
      67a743d0
    • David Douard's avatar
      Refactor the checker stack · 9cd7414a
      David Douard authored
      A checker configuration must now be created before being
      able to start a checker session. This configuration is stored in the
      database and consist in a triplet
      
        (datastore, object_type, nb_partitions)
      
      Once done, any number of checker can be started for this specific
      checker configuration; each checher process will check partitions
      one by one, using the status stored in the database to get the next
      partition number to check on the next iteration.
      
      This allows to dynamically adapt the number of checker processes.
      
      For example, checking the shapshots splitting the hash space in 4096
      partitions using 4 parallel workers could be like:
      
        $ export SWH_CONFIG_FILENAME=config.yml
        $ swh scrubber check init --object-type snapshot --nb-partitions 4096 --name cfg-snp
        Created configuration cfg-snp [3] for checking shapshot in postgresql storage
      
        $ for i in {1..4}; do (swh scrubber check storage cfg-snp &); done
      9cd7414a
  23. Jul 07, 2023
Loading