Skip to content
Snippets Groups Projects
Verified Commit 8ab8240d authored by Antoine R. Dumont's avatar Antoine R. Dumont
Browse files

Migrate upgrade swh services and storage db migration

This splits the upgrade swh services from the intranet page in 2 distincts
documentation.

Related to T3154
parent 4e9e5ce0
No related branches found
No related tags found
No related merge requests found
.. _data-migration:
How to handle data migrations
=============================
Empty page
----------
.. todo::
This page is a work in progress.
......@@ -7,4 +7,4 @@ SWH Software Deployment
deployment-environments
upgrade-swh-service
deploy-lister
data-migration
storage-database-migration
.. _storage-database-migration:
How to handle a storage database migration
==========================================
.. admonition:: Intended audience
:class: important
sysadm staff members
If a storage database upgrade is needed, a migration script should already exists in the
*swh-storage* git repository.
.. _upgrade_version:
Upgrade version
---------------
Check the current database version (first one in desc order):
.. code:: sql
select dbversion from dbversion order by version desc limit 1;
Say, for example that the result is 159 here.
Check the migration script folder in swh-storage:/sql/upgrades/ (and find the next one,
for example `160.sql
<https://forge.softwareheritage.org/source/swh-storage/browse/master/sql/upgrades/160.sql>`_).
It's previous version number + 1 from the given db version retrieved (so 160 with the
current example).
Note: That you could need to run more than one migration. It depends on the current
packaged version and the next version we want to deploy. Check the git history to
determine that.
Requisite
---------
Ensure the migration script runs first in the staging database
(db0.internal.staging.swh.network is the node holding the swh staging database). Then
you can go ahead and run it in production database
(belvedere.internal.softwareheritage.org).
Connect to the db with the user with write permission, then run the
script:
.. code::
$ psql -e ...
> \i sql/upgrades/160.sql
Note:
- *-e* so you can see the queries currently running prior to its result
- For long-running scripts, connect to the remote machine first [5] [6]
Adaptations
-----------
Hopefully, in production, the script runs as is without adaptation…
Otherwise, if the data volume for a given table is large, you may want to adapt. See
`160.sql
<https://forge.softwareheritage.org/source/swh-storage/browse/master/sql/upgrades/160.sql>`_
and `its adaptation <https://forge.softwareheritage.org/P747>`_
For such a case, consider working on ranges on the table id instead. So it uses index
and keep the transaction short. Long-standing migration query (translates to long
running transaction). This could create too many WALs accumulation (for the
replication), thus disk space starvation issue, etc…
Note
----
We use grafana to ensure everything is fine (for example, for the replication, we use
the `postgresql database dashboard, bottom page to the right
<https://grafana.softwareheritage.org/d/PEKz-Ygiz/postgresql-server-overview?orgId=1&refresh=5m&from=1598405876817&to=1598427476817&var-instance=belvedere.internal.softwareheritage.org&var-cluster=:5433&var-datname=All&var-ntop_relations=5&var-interface=All&var-disk=All&var-filesystem=All&var-application_name=All&var-rate_interval=5m>`_).
We also use it to keep a reference of what happened for a given deployment. For this,
Open a grafana dashboard (for example `worker task processing dashboard
<https://grafana.softwareheritage.org/d/b_xh3f9ik/worker-task-processing?orgId=1&from=now-6h&to=now>`_)
and add a tag *deployment* (so it's shared across dashboards) with a description on what
is the current deployment about. It's usually a list of module names that gets deployed
and associated version deployed.
.. _upgrade-swh-service:
How to upgrade swh service
==========================
Upgrade swh service
===================
Empty page
----------
.. admonition:: Intended audience
:class: important
sysadm staff members
Workers
-------
Dedicated workers [1] run our *swh-worker@loader_{git, hg, svn, npm, ...}* services.
When a new version is released, we need to upgrade their package(s).
[1] Here are the following group name (in `clush
<https://clustershell.readthedocs.io/en/latest/index.html>`_ terms):
- *@swh-workers* for the production workers
- *@azure-workers* for the production ones running on azure
- *@staging-loader-workers* for the staging ones
See :ref:`deploy-new-lister` for a practical example.
Code and publish
----------------
.. _fix-or-evolve-code:
Code an evolution of fix an issue in the python code within the git repository's master
branch. Open a diff for review, land it when accepted, and start back at :ref:`tag and push
<tag-and-push>`.
.. _tag-and-push:
Tag and push
~~~~~~~~~~~~
When ready, `git tag` and `git push` the new tag of the module.
.. code::
$ git tag vA.B.C
$ git push origin --follow-tags
.. _publish-and-deploy:
Publish and deploy
~~~~~~~~~~~~~~~~~~
Let jenkins publish and deploy the debian package.
.. _troubleshoot:
Troubleshoot
~~~~~~~~~~~~
If jenkins fails for some reason, fix the module be it :ref:`python code
<fix-or-evolve-code>` or the :ref:`debian packaging <troubleshoot-debian-package>`.
.. _troubleshoot-debian-package:
Debian package troubleshoot
~~~~~~~~~~~~~~~~~~~~~~~~~~~
In that case, upgrade and checkout the *debian/unstable-swh* branch, then fix whatever
is not updated or broken due to a change. It's usually a missing new package dependency
to fix in *debian/control*). Add a new entry in *debian/changelog*. Make sure gbp builds
fine. Then tag it. Jenkins will build the package anew.
.. code::
$ gbp buildpackage --git-tag-only --git-sign-tag # tag it
$ git push origin --follow-tags # trigger the build
Deploy
------
.. _nominal_case:
Nominal case
~~~~~~~~~~~~
Update the machine dependencies and restart service. That usually means
as sudo user:
.. code::
$ apt-get update
$ apt-get dist-upgrade -y
$ systemctl restart swh-worker@loader_${type}
Note that this is for one machine you ssh into.
We usually wrap those commands from the sysadmin machine pergamon [3] with the *clush*
command, something like:
.. code::
$ sudo clush -b -w @swh-workers 'apt-get update; env DEBIAN_FRONTEND=noninteractive \
apt-get -o Dpkg::Options::="--force-confdef" \
-o Dpkg::Options::="--force-confold" -y dist-upgrade'
[3] pergamon is already *clush* configured to allow multiple ssh connections in parallel
on our managed infrastructure nodes.
.. _configuration-change-required:
Configuration change required
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Either wait for puppet to actually deploy the changes first and then go back to the
nominal case.
Or force a puppet run:
.. code::
sudo clush -b -w @swh-workers puppet agent -t
Note: *-t* is not optional
.. _long-standing-migration:
Long-standing migration
~~~~~~~~~~~~~~~~~~~~~~~
In that case, you may need to stop all services for migration which could take some time
(because lots of data is migrated for example).
You need to momentarily stop puppet (which runs every 30 min to apply manifest changes)
and the cron service (which restarts down services) on the workers nodes.
Report yourself to the :ref:`storage database migration <storage-database-migration>`
for a concrete case of database migration.
.. code::
$ sudo clush -b -w @swh-workers 'systemctl stop cron.service; puppet agent --disable'
Then:
- Execute the database migration.
- Go back to the nominal case.
- Restart puppet and the cron on workers
.. code::
$ sudo clush -b -w @swh-workers 'systemctl start cron.service; puppet agent --enable'
.. todo::
This page is a work in progress.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment