Skip to content

Improve 'swh db' commands for easier usage in test environments and better consistency

David Douard requested to merge douardda/swh-core:cli-db-init into master

Warning: I still have revisions to push in this MR to fit the description below. should be ok now

The main idea is to make it easy to manage the whole lifecycle of swh.core.db based backends getting configuration from the config file.

Using this configuration file:

$ cat conf.yml
storage:
  cls: pipeline
  steps:
    - cls: masking
      masking_db: postgresql:///?service=swh-masking-proxy
    - cls: buffer
    - cls: postgresql
      db: postgresql://user:passwd@pghost:5433/swh-storage
      objstorage:
        cls: memory

scheduler:
  cls: postgresql
  db: postgresql:///?service=swh-scheduler

For each swh db command, the main argument (previously only the backend 'module') can be:

  • a (swh) package name without the --all option: this is the bw compat mode, for which the config entry is looked for in the config file using the current logic (esp. use the last entry of a pipeline, if any)
  • a (swh) package name with the --all option: run the command for all the backends found in the config file under the package section,
  • the actual backend (using the syntax ':', e.g. storage:masking),
  • a "path" to target which config entry (from the config file) should be used, like storage.steps.0 (see the config example above),
  • when the --dbname is used, the config file is not used at all: it's an explicit db connection (libpq) string.

Examples:

$ export SWH_CONFIG_FILENAME=conf.yml
$ swh db init -p storage.steps.0
$ swh db init --all storage 
$ swh db init --dbname postgresql:///?service=swh-scrubber scrubber  # this has no entry in the config file
$ swh db init --dbname postgresql:///?service=masking swh:masking # this does not look into the config file

Or

$ swh db version -a storage

module: storage:masking
current code version: 194
version: 194

module: storage:postgresql
flavor: default
current code version: 193
version: 193

This should help deployment in integration testing environments (like docker), especially with cases like the example above where the storage consists in several postgresql-backend layers.

It de facto deprecates the usage of the --module-config-key option.

This MR also change the way the dbmodule table is managed: we used to only store the package name (e.g. 'storage' for the postgresql backend of the swh.storage) with some implicit business logic, or the module name (e.g. 'storage.proxies.masking'; without the swh. prefix). We now store the backend as <package>:<cls> where <package> is the swh.<package> in which the backend is defined, and <cls> is the value of the cls config entry for said backend. It generalizes the idea of declaring these cls in swh.<package>.classes entry points.

The swh db upgrade should take care of updating the dbmodule table accordingly.

Edited by David Douard

Merge request reports