Skip to content
Snippets Groups Projects

Improve 'swh db' commands for easier usage in test environments and better consistency

Merged David Douard requested to merge douardda/swh-core:cli-db-init into master

Warning: I still have revisions to push in this MR to fit the description below. should be ok now

The main idea is to make it easy to manage the whole lifecycle of swh.core.db based backends getting configuration from the config file.

Using this configuration file:

$ cat conf.yml
storage:
  cls: pipeline
  steps:
    - cls: masking
      masking_db: postgresql:///?service=swh-masking-proxy
    - cls: buffer
    - cls: postgresql
      db: postgresql://user:passwd@pghost:5433/swh-storage
      objstorage:
        cls: memory

scheduler:
  cls: postgresql
  db: postgresql:///?service=swh-scheduler

For each swh db command, the main argument (previously only the backend 'module') can be:

  • a (swh) package name without the --all option: this is the bw compat mode, for which the config entry is looked for in the config file using the current logic (esp. use the last entry of a pipeline, if any)
  • a (swh) package name with the --all option: run the command for all the backends found in the config file under the package section,
  • the actual backend (using the syntax ':', e.g. storage:masking),
  • a "path" to target which config entry (from the config file) should be used, like storage.steps.0 (see the config example above),
  • when the --dbname is used, the config file is not used at all: it's an explicit db connection (libpq) string.

Examples:

$ export SWH_CONFIG_FILENAME=conf.yml
$ swh db init -p storage.steps.0
$ swh db init --all storage 
$ swh db init --dbname postgresql:///?service=swh-scrubber scrubber  # this has no entry in the config file
$ swh db init --dbname postgresql:///?service=masking swh:masking # this does not look into the config file

Or

$ swh db version -a storage

module: storage:masking
current code version: 194
version: 194

module: storage:postgresql
flavor: default
current code version: 193
version: 193

This should help deployment in integration testing environments (like docker), especially with cases like the example above where the storage consists in several postgresql-backend layers.

It de facto deprecates the usage of the --module-config-key option.

This MR also change the way the dbmodule table is managed: we used to only store the package name (e.g. 'storage' for the postgresql backend of the swh.storage) with some implicit business logic, or the module name (e.g. 'storage.proxies.masking'; without the swh. prefix). We now store the backend as <package>:<cls> where <package> is the swh.<package> in which the backend is defined, and <cls> is the value of the cls config entry for said backend. It generalizes the idea of declaring these cls in swh.<package>.classes entry points.

The swh db upgrade should take care of updating the dbmodule table accordingly.

Edited by David Douard

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Jenkins job DCORE/gitlab-builds #236 succeeded in 1 min 11 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

  • David Douard changed the description

    changed the description

  • David Douard added 2 commits

    added 2 commits

    • 275d21b4 - Make 'swh db init(-admin)'support reading the db cnx string from anywhere in the config file
    • 16638fa9 - Make 'swh db version' and 'updgrade' also read cnx from config file

    Compare with previous version

  • Jenkins job DCORE/gitlab-builds #237 succeeded in 55 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

    • Resolved by David Douard

      For what I understand of the diff, the default behavior doesn't seem to be impacted, but if it's the case in the future, please notify the sysadmin team.

      This piece of script is used to detect and/or migrate the databases when a pod is started (ping @ardumont). For example: https://gitlab.softwareheritage.org/swh/infra/ci-cd/swh-charts/-/blob/2c0faa911e3a6736c92e1e67452d0b43dd98fd3d/swh/templates/utils/backend-utils.yaml#L311

      Do you think it would be possible to implement an equivalent to list all the versions of all the modules for the version command?

      What to you think too about standardizing the database configuration entry in db:. It will allow to identify the module by the class to have something like:

      $ cat conf.yml
      storage:
        cls: pipeline
        steps:
          - cls: masking
            db: postgresql:///?service=swh-masking-proxy    <--- Changed here
          - cls: buffer
          - cls: postgresql
            db: postgresql://user:passwd@pghost:5433/swh-storage
            objstorage:
              cls: memory
      
      $ swh db -C conf.yml init masking <---- Changed here

      It only works if all the steps always have different classes.

  • mentioned in merge request swh-scrubber!74 (merged)

  • Author Maintainer

    After this discussion this MR probably needs more work.

  • David Douard changed the description

    changed the description

  • Author Maintainer

    FTR the comment was:

    Then I think you need two changes to the entrypoint:

    1. rename it, so it's in the swh.core namespace
    2. actually point to get_datastore, so swh.core does not introspect it someone from the class
    1. it's not just a matter of renaming the entrypoint group, because (as it is today) a "swh.xxx.classes" entry point capture both the "it's a backend for the xxx package" (the side used by the get_xxx() functions in some swh packages) and is used for the fact that some of these backend are indeed swh.db based, and can thus profit from common lifecycle management features and tools. This MR does not really change this "design".
    2. use a better backend declaration API (using new entrypoints rather than existing ones) to depend less on introspection and base the logic on a more declarative approach would be nice indeed, but it's a much more involved work, not sure I really want to start it now (I played a bit with the idea, but I'm afraid it's a too deep rabbit hole).

    So I suggest we stick to this far-from-ideal proposal which should nonetheless greatly help and come back to a better solution if needed in the future.

    Edited by David Douard
  • David Douard added 6 commits

    added 6 commits

    • 793ef062 - Make 'swh db init(-admin)' support reading the db cnx string from anywhere in the config file
    • 4d122d09 - Make 'swh db version' and 'updgrade' also read cnx from config file
    • d61cc496 - cli: replace '--all' option in 'db version' with '--history'
    • cb5ea3c2 - cli: also look for cnx string in the config file in 'db create'
    • 8be91f87 - cli: use the same '--all' flag for all db commands acting on all known backends
    • 69d77ff4 - db: lower a warning statement to an info

    Compare with previous version

  • Jenkins job DCORE/gitlab-builds #238 succeeded in 56 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

  • David Douard added 2 commits

    added 2 commits

    • e045e95b - cli: add '--all' option to the 'db version' command
    • 3a52ef5f - test/db/test_cli: use raw yaml instead of dumping nested dicts

    Compare with previous version

  • Jenkins job DCORE/gitlab-builds #239 succeeded in 54 sec.
    See Console Output, Blue Ocean and Coverage Report for more details.

    • Author Maintainer
      Resolved by David Douard

      So now:

      • all the db commands (but a couple of very niche) can be used with a '--all' options to operate on all the db backends found in the config file under the given swh package (module).
      • the module name registered in the dbmodule table is now the fully qualified module name implementing the backend (i.e. 'swh.storage.postgresql.storage' instead of just 'storage')
      • for the time being, it should be completely bw compatible with existing versions of swh packages using swh.core.db
      • it's not clear yet if / how to execute the "db migration" needed to update the content of the dbmodule table.

      But now that I'm looking at this MR I feel like it would make more sense to use entries like "swh.storage:postgresql" as entries in this table (i.e. the backend as declared in entry points and used when reading config files rather than the module full path). It would make it easier to detect an entry in this dbmodule table needs be updated at the very least.

      Edited by David Douard
  • David Douard changed title from Make 'swh db' commands support reading the db cnx string from anywhere in the config file to Improve 'swh db' commands for easier usage in test envirnoments and better consistency

    changed title from Make 'swh db' commands support reading the db cnx string from anywhere in the config file to Improve 'swh db' commands for easier usage in test envirnoments and better consistency

  • David Douard changed the description

    changed the description

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading