Mariadb instance is down (aka "saving private tate")
This service is running on tate.i.s.o and impacts the following services (random order):
- https://forge.softwareheritage.org [2]
- https://intranet.softwareheritage.org/ [1]
- https://wiki.softwareheritage.org [1]
[1] From wiki:
Sorry! This site is experiencing technical difficulties.
Try waiting a few minutes and reloading.
(Cannot access the database)
[2] From (deprecated) forge:
Can Not Connect to MySQL
[3]
root@tate:~# lsb_release -sc
buster
root@tate:~# dpkg -l mariadb-server | grep ii
ii mariadb-server 1:10.3.38-0+deb10u1 all MariaDB database server (metapackage depending on the latest version)
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Antoine R. Dumont added activity::MRO label
added activity::MRO label
- Antoine R. Dumont changed title from Co-hosted wikis are down to Co-hosted mediawikis are down
changed title from Co-hosted wikis are down to Co-hosted mediawikis are down
- Antoine R. Dumont changed the description
changed the description
- Author Owner
Something is happening during service startup regarding ssl:
root@tate # grep -i error /var/log/mysql/error.log ... 2023-02-21 6:50:45 0 [ERROR] Failed to setup SSL 2023-02-21 6:50:45 0 [ERROR] SSL error: SSL_CTX_set_default_verify_paths failed 2023-02-21 6:50:45 0 [ERROR] Aborting ...
Which goes in the direction olasd intuited yesterday evening too:
18:28:31* +olasd | ardumont: mariadb fails to start with a SSL issue. looks like there was something funny with the mariadb LTS update
Edited by Antoine R. Dumont - Antoine R. Dumont changed title from Co-hosted mediawikis are down to Mariadb instance is down
changed title from Co-hosted mediawikis are down to Mariadb instance is down
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont changed the description
changed the description
- Author Owner
Starting out with checking the configuration, something is fishy already. Apparently, ssl is set to false. And even if it were not, it's targetting inexistant files...
root@tate:~# grep -i ssl /etc/mysql/my.cnf ssl = false ssl-ca = /etc/mysql/cacert.pem ssl-cert = /etc/mysql/server-cert.pem ssl-key = /etc/mysql/server-key.pem root@tate:~# file /etc/mysql/cacert.pem /etc/mysql/cacert.pem: cannot open `/etc/mysql/cacert.pem' (No such file or directory) root@tate:~# file /etc/mysql/server-cert.pem /etc/mysql/server-cert.pem: cannot open `/etc/mysql/server-cert.pem' (No such file or directory) root@tate:~# file /etc/mysql/server-key.pem /etc/mysql/server-key.pem: cannot open `/etc/mysql/server-key.pem' (No such file or directory)
- Author Owner
Not that I understand the problem yet. The problem started after the last upgrade of the package:
root@tate:~# grep mariadb /var/log/dpkg.log 2023-02-21 06:50:23 upgrade libmariadb3:amd64 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:50:23 status half-configured libmariadb3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:23 status unpacked libmariadb3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:23 status half-installed libmariadb3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:23 status unpacked libmariadb3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:23 configure libmariadb3:amd64 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:50:23 status unpacked libmariadb3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:23 status half-configured libmariadb3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:23 status installed libmariadb3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:27 upgrade mariadb-common:all 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:50:27 status half-configured mariadb-common:all 1:10.3.36-0+deb10u2 2023-02-21 06:50:27 status unpacked mariadb-common:all 1:10.3.36-0+deb10u2 2023-02-21 06:50:27 status half-installed mariadb-common:all 1:10.3.36-0+deb10u2 2023-02-21 06:50:27 status unpacked mariadb-common:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:27 upgrade mariadb-client-core-10.3:amd64 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:50:27 status half-configured mariadb-client-core-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:27 status unpacked mariadb-client-core-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:27 status half-installed mariadb-client-core-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:27 status unpacked mariadb-client-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 upgrade mariadb-client-10.3:amd64 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status half-configured mariadb-client-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:28 status unpacked mariadb-client-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:28 status half-installed mariadb-client-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:28 status unpacked mariadb-client-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 upgrade mariadb-client:all 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status half-configured mariadb-client:all 1:10.3.36-0+deb10u2 2023-02-21 06:50:28 status unpacked mariadb-client:all 1:10.3.36-0+deb10u2 2023-02-21 06:50:28 status half-installed mariadb-client:all 1:10.3.36-0+deb10u2 2023-02-21 06:50:28 status unpacked mariadb-client:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 configure mariadb-common:all 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:50:28 status unpacked mariadb-common:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status half-configured mariadb-common:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status installed mariadb-common:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 configure mariadb-client-core-10.3:amd64 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:50:28 status unpacked mariadb-client-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status half-configured mariadb-client-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status installed mariadb-client-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 configure mariadb-client-10.3:amd64 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:50:28 status unpacked mariadb-client-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status half-configured mariadb-client-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status installed mariadb-client-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 configure mariadb-client:all 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:50:28 status unpacked mariadb-client:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status half-configured mariadb-client:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:28 status installed mariadb-client:all 1:10.3.38-0+deb10u1 2023-02-21 06:50:34 upgrade mariadb-server-core-10.3:amd64 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:50:34 status half-configured mariadb-server-core-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:34 status unpacked mariadb-server-core-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:35 status half-installed mariadb-server-core-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:36 status unpacked mariadb-server-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:36 configure mariadb-server-core-10.3:amd64 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:50:36 status unpacked mariadb-server-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:36 status half-configured mariadb-server-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:36 status installed mariadb-server-core-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:39 upgrade mariadb-server-10.3:amd64 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:50:39 status half-configured mariadb-server-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:42 status unpacked mariadb-server-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:42 status half-installed mariadb-server-10.3:amd64 1:10.3.36-0+deb10u2 2023-02-21 06:50:44 status unpacked mariadb-server-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:44 configure mariadb-server-10.3:amd64 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:50:44 status unpacked mariadb-server-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:44 status half-configured mariadb-server-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:50:51 status installed mariadb-server-10.3:amd64 1:10.3.38-0+deb10u1 2023-02-21 06:51:00 upgrade mariadb-server:all 1:10.3.36-0+deb10u2 1:10.3.38-0+deb10u1 2023-02-21 06:51:00 status half-configured mariadb-server:all 1:10.3.36-0+deb10u2 2023-02-21 06:51:00 status unpacked mariadb-server:all 1:10.3.36-0+deb10u2 2023-02-21 06:51:00 status half-installed mariadb-server:all 1:10.3.36-0+deb10u2 2023-02-21 06:51:00 status unpacked mariadb-server:all 1:10.3.38-0+deb10u1 2023-02-21 06:51:00 configure mariadb-server:all 1:10.3.38-0+deb10u1 <none> 2023-02-21 06:51:00 status unpacked mariadb-server:all 1:10.3.38-0+deb10u1 2023-02-21 06:51:00 status half-configured mariadb-server:all 1:10.3.38-0+deb10u1 2023-02-21 06:51:00 status installed mariadb-server:all 1:10.3.38-0+deb10u1
- Author Owner
There might have been a problem during the upgrade as I just noticed that node is full on /
root@tate:~# df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 99G 95G 0 100% /
I'll clean it up and reboot it prior to do anything else.
- Author Owner
While cleaning it up, i've checked the changelog of mariadb version and this caught my eyes:
SSL The server no longer tolerates incorrectly configured SSL (MDEV-29811). If you have enabled SSL in my.cnf but have not configured it properly (for example, a certificate file is missing), MariaDB used to silently disable SSL, leaving you under impression that everything was fine and connections were secure. Since this release, MariaDB will fail to start if SSL is enabled, but cannot be switched on.
As we migrated from 10.3.36 from 10.3.38, that might explain the current "fishy" my.cnf configuration i've mentioned early on. Currently, ssl is completely disabled.
[1] https://mariadb.com/kb/en/mariadb-10-3-37-release-notes/
- Author Owner
After the reboot, this cascaded into a new error about unbound which refuses to start now... [1] (which creates problem for jenkins.s.o and auth.s.o)
"everything is fine"
[1]
root@tate # journalctl -xef -u unbound Feb 22 11:25:39 tate unbound[4937]: [4937:0] error: failed to read /var/lib/unbound/root.key Feb 22 11:25:39 tate unbound[4937]: [4937:0] error: error reading auto-trust-anchor-file: /var/lib/unbound/root.key Feb 22 11:25:39 tate unbound[4937]: [4937:0] error: validator: error in trustanchors config Feb 22 11:25:39 tate unbound[4937]: [4937:0] error: validator: could not apply configuration settings. Feb 22 11:25:39 tate unbound[4937]: [4937:0] error: module init for module validator failed Feb 22 11:25:39 tate unbound[4937]: [4937:0] fatal error: failed to setup modules
And that file is empty:
root@tate:/etc# file /var/lib/unbound/root.key /var/lib/unbound/root.key: empty
By comparison, checking elsewhere (without issues (yet!?)), it's not supposed to be empty:
root@worker16:~# ls -lah /var/lib/unbound/root.key -rw-r--r-- 1 unbound unbound 759 Feb 22 05:55 /var/lib/unbound/root.key
Edited by Antoine R. Dumont - Author Owner
Thanks to this bug report [1], i've managed to generate another non-empty root.key. Which allowed to get back on track regarding unbound [2].
Back on trying to understand the mariadb issue now.
[2]
root@tate:/etc# unbound-anchor -a "/var/lib/unbound/root.key" root@tate:/etc# ls -lah "/var/lib/unbound/root.key" -rw-r--r-- 1 root root 758 Feb 22 11:31 /var/lib/unbound/root.key root@tate:/etc# systemctl start unbound.service root@tate:/etc# systemctl status unbound.service | grep "Active" Active: active (running) since Wed 2023-02-22 11:31:46 UTC; 35s ago
[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232555
- Antoine R. Dumont changed title from Mariadb instance is down to Mariadb instance is down (aka "to the rescue of tate")
changed title from Mariadb instance is down to Mariadb instance is down (aka "to the rescue of tate")
- Antoine R. Dumont changed title from Mariadb instance is down (aka "to the rescue of tate") to Mariadb instance is down (aka "saving private tate")
changed title from Mariadb instance is down (aka "to the rescue of tate") to Mariadb instance is down (aka "saving private tate")
- Author Owner
Nothing comes up when looking for such issue.
I was under the impression the ssl keys in the configuration were off with that new version so I acted on that.
Commenting out those ssl keys in /etc/mysql/my.cnf, that allowed tostart mariadb service again. [1] So it seems i'm not far-off in my hypothesis.
Now, I need to make puppet understand and either drop those or install the "new" correct ones [2]
[1]
root@tate:/etc# puppet agent --disable "mariadb does not start https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4777" root@tate:/etc# emacs /etc/mysql/my.cnf root@tate:/etc# grep ssl /etc/mysql/my.cnf # ssl= false # ssl_ca = /etc/mysql/cacert.pem # ssl_cert = /etc/mysql/server-cert.pem # ssl_key = /etc/mysql/server-key.pem root@tate:/etc# systemctl restart mariadb.service root@tate:/etc# systemctl status mariadb.service ● mariadb.service - MariaDB 10.3.38 database server Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/mariadb.service.d └─override.conf Active: active (running) since Wed 2023-02-22 13:49:20 UTC; 3s ago Docs: man:mysqld(8) https://mariadb.com/kb/en/library/systemd/ Process: 21715 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS) Process: 21716 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS) Process: 21718 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-env$ Process: 21845 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS) Process: 21847 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS) Main PID: 21807 (mysqld) Status: "Taking your SQL requests now..." Tasks: 56 (limit: 14336) Memory: 349.5M CGroup: /system.slice/mariadb.service └─21807 /usr/sbin/mysqld Feb 22 13:49:18 tate systemd[1]: Starting MariaDB 10.3.38 database server... Feb 22 13:49:20 tate systemd[1]: Started MariaDB 10.3.38 database server. root@tate:/etc# less /etc/mysql/my.cnf root@tate:/etc# puppet agent --enable && puppet agent -t --noop && puppet agent --disable Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin
[2]
-# ssl= false -# ssl_ca = /etc/mysql/cacert.pem -# ssl_cert = /etc/mysql/server-cert.pem -# ssl_key = /etc/mysql/server-key.pem +ssl = false +ssl-ca = /etc/mysql/cacert.pem +ssl-cert = /etc/mysql/server-cert.pem +ssl-key = /etc/mysql/server-key.pem
- Author Owner
Regarding puppet/mariadb/ssl, those 2 commits are particularly interesting [1] [2]
The first one disables ssl keys generation in the configuration when ssl is not enabled (our case).
All in all, looks like i'm up for a bump in our swh-site Pupppetfile repository for that third-party library.
[1] swh/infra/puppet/3rdparty/puppet-puppetlabs-mysql@6cfcdbdb
[2] swh/infra/puppet/3rdparty/puppet-puppetlabs-mysql@b54ccd4c
- Author Owner
Bumping locally to a most recent version, among other changes, those keys are indeed dropped!
$ cd .../swh-site $ git diff HEAD~ diff --git a/Puppetfile b/Puppetfile index 068ca3cc..543a8344 100644 --- a/Puppetfile +++ b/Puppetfile @@ -130,7 +130,7 @@ mod 'locales', mod 'mysql', :git => 'https://gitlab.softwareheritage.org/swh/infra/puppet/3rdparty/puppet-puppetlabs-mysql', - :ref => 'v12.0.1' + :ref => 'v13.1.0' mod 'nginx', :git => 'https://gitlab.softwareheritage.org/swh/infra/puppet/3rdparty/puppet-puppet-nginx', $ $SWH_PUPPET_ENVIRONMENT_HOME/bin/octocatalog-diff --to staging tate.softwareheritage.org Found host tate.softwareheritage.org .... diff origin/production/tate.softwareheritage.org current/tate.softwareheritage.org ******************************************* Exec[datadir-managed_dir-chmod] => parameters => command => - /bin/chmod 777 /var/lib/mysql + ["/bin/chmod", "777", "/var/lib/mysql"] ******************************************* Exec[datadir-managed_dir-mkdir] => parameters => command => - /bin/mkdir -p /var/lib/mysql + ["/bin/mkdir", "-p", "/var/lib/mysql"] unless => - /usr/bin/dpkg -s mariadb-server + [["/usr/bin/dpkg", "-s", "mariadb-server"]] ******************************************* Exec[remove install pass] => parameters => command => - mysqladmin -u root --password=$(grep -o '[^ ]\+$' /.mysql_secret) password '' && rm -f /.mysql_secret + mysqladmin -u root --password=$(grep -o '[^ ]\+$' /.mysql_secret) password && (rm -f /.mysql_secret; exit 0) || (rm -f /.mysql_secret; exit 1) onlyif => - test -f /.mysql_secret + [["test", "-f", "/.mysql_secret"]] ******************************************* Exec[wait_for_mysql_socket_to_open] => parameters => command => - test -S /var/run/mysqld/mysqld.sock + ["test", "-S", "/var/run/mysqld/mysqld.sock"] unless => - test -S /var/run/mysqld/mysqld.sock + [["test", "-S", "/var/run/mysqld/mysqld.sock"]] ******************************************* File[/etc/mysql/my.cnf] => parameters => content => @@ -32,7 +32,4 @@ sql_mode = STRICT_ALL_TABLES ssl = false -ssl-ca = /etc/mysql/cacert.pem -ssl-cert = /etc/mysql/server-cert.pem -ssl-key = /etc/mysql/server-key.pem thread_cache_size = 8 thread_stack = 256K ******************************************* *** End octocatalog-diff on tate.softwareheritage.org
- Antoine R. Dumont mentioned in commit swh/infra/puppet/puppet-swh-site@7c7d4d3f
mentioned in commit swh/infra/puppet/puppet-swh-site@7c7d4d3f