Database replication lag keeps growing on somerset
It seems the softwareheritage
database replication from belvedere
to somerset
is no more functional since at least one week.
This is what we get when we count the number of origins on belvedere
:
antoine@guggenheim:~$ psql service=swh
psql (11.5 (Debian 11.5-1+deb10u1), serveur 11.3 (Debian 11.3-1.pgdg90+1))
Connexion SSL (protocole : TLSv1.2, chiffrement : ECDHE-RSA-CHACHA20-POLY1305, bits : 256, compression : désactivé)
Saisissez « help » pour l'aide.
softwareheritage=> select count(*) from origin;
count
----------
90429442
(1 ligne)
While the same query on somerset
returns the following:
antoine@guggenheim:~$ psql service=swh-replica
psql (11.5 (Debian 11.5-1+deb10u1), serveur 11.3 (Debian 11.3-1.pgdg90+1))
Connexion SSL (protocole : TLSv1.2, chiffrement : ECDHE-RSA-AES256-GCM-SHA384, bits : 256, compression : désactivé)
Saisissez « help » pour l'aide.
softwareheritage=> select count(*) from origin;
count
----------
90233725
(1 ligne)
So we are currently missing 195717 origins in the replica. This number keeps growing as yersteday it was equal to 171809.
This lack of replication impacts the Software Heritage web application as it uses the database hosted on somerset
.
For instance, all 'Save code now' requests submitted since the last week are still marked as scheduled even if they were correctly executed.
Because the newly ingested origins are not present in the replica database, no visit date can be found for them and thus the erroneous
reported status.
Migrated from T2016 (view on Phabricator)