Borgmatic service alerts since bookworm migration
Here are the servers where borg is installed:
root@pergamon:~# clush -b -w @all "dpkg -l borg{backup,matic} 2> /dev/null | grep '^ii'"
clush: cassandra[01-13],esnode[1-3,7-9],getty,giverny,jenkins-docker[01-02],kafka[1-4],kelvingrove,kibana0,logstash0,ns0.euwest.azure,rancher-node-highmem[01-02],rancher-node-metal[01-03],rancher-node-production-rke2-mgmt[1-3],search-esnode[4-6],thanos-compact.euwest.azure,thyssen (44): exited with exit code 1
---------------
banco,branly,chaillot,counters1,moma,mucem,pergamon,rancher-node-metal04,saam,saatchi,tate,uffizi (12)
---------------
ii borgbackup 1.2.4-1 amd64 deduplicating and compressing backup program
ii borgmatic 1.7.7-1 all automatically create, prune and verify backups with borgbackup
The borgmatic package in bookworm install a systemd service and timer:
root@uffizi:~# lsb_release -c
No LSB modules are available.
Codename: bookworm
root@uffizi:~# dpkg -L borgmatic | awk '/borgmatic\.(timer|service)/'
/lib/systemd/system/borgmatic.service
/lib/systemd/system/borgmatic.timer
root@dali:~# lsb_release -c
Codename: bullseye
root@dali:~# dpkg -L borgmatic | awk '/borgmatic\.(timer|service)/'
Previously borgmatic was managed by cron jobs whivh are still present today:
root@dali:~# cat /etc/cron.d/puppet-borgmatic
# Managed by puppet (module profile::cron), manual changes will be lost
# Cron snippet borgmatic-create
51 0,1,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 * * * root borgmatic create
# Cron snippet borgmatic-full
51 3 * * * root borgmatic
- The cron task
borgmatic-full
and the systemd service seems redundant on bookworm servers. -
saam
andrancher-node-metal04
are not declared in the borg authorized_keys:
root@banco:~# awk -F '"' '{n=(split($2,a,"/"));print a[n]}' /srv/borg/.ssh/authorized_keys
albertina.internal.softwareheritage.org
banco.internal.softwareheritage.org
bardo.internal.admin.swh.network
bojimans.internal.admin.swh.network
branly.internal.softwareheritage.org
chaillot.internal.softwareheritage.org
counters0.internal.staging.swh.network
counters1.internal.softwareheritage.org
dali.internal.admin.swh.network
db1.internal.staging.swh.network
massmoca.internal.softwareheritage.org
moma.internal.softwareheritage.org
mucem.internal.softwareheritage.org
pergamon.internal.softwareheritage.org
rp0.internal.staging.swh.network
rp1.internal.admin.swh.network
saatchi.internal.softwareheritage.org
scheduler0.internal.staging.swh.network
tate.internal.softwareheritage.org
uffizi.internal.softwareheritage.org
so borgmatic (systemd service or cron task) got a permission denied
error.
3. bardo
backups have not been cleaned up for a long time:
root@bardo:~# borgmatic info
ssh://borg@banco.internal.softwareheritage.org/srv/borg/repositories/bardo.internal.admin.swh.network: Displaying archive summary information
Repository ID: 87e59d0c58d6d0855ce3b613b111dd6265773c5f3a8001e0d9d1cab8cb8fafc7
Location: ssh://borg@banco.internal.softwareheritage.org/srv/borg/repositories/bardo.internal.admin.swh.network
Encrypted: Yes (repokey BLAKE2b)
Cache: /var/lib/borg/.cache/borg/87e59d0c58d6d0855ce3b613b111dd6265773c5f3a8001e0d9d1cab8cb8fafc7
Security dir: /var/lib/borg/.config/borg/security/87e59d0c58d6d0855ce3b613b111dd6265773c5f3a8001e0d9d1cab8cb8fafc7
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
All archives: 30.78 TB 17.49 TB 28.55 GB
Unique chunks Total chunks
Chunk index: 628567 1402382808
root@bardo:~# borgmatic list | ( head -2 ; printf "...\n" ; tail -2 )
ssh://borg@banco.internal.softwareheritage.org/srv/borg/repositories/bardo.internal.admin.swh.network: Listing archives
bardo.internal.admin.swh.network-2023-12-31T23:12:01.459433 Sun, 2023-12-31 23:12:03 [154d24f40fac1d6f4de025f320fdd4fe66fc71831c0d7f1847d37091ad7e9fc1]
...
bardo.internal.admin.swh.network-2025-03-17T13:12:02.436941 Mon, 2025-03-17 13:12:03 [c828823a2a64c9509f240a116f7d94ae3682d13018f3adb0d20cc25a5cbacaaa]
so the prune operation systematically fails even with increasing the ssh alive interval 1 and count on both server and client