Fix duplicate dns record which makes puppet agent looping over install of k8s-rancher-apps & kafka1.staging records
pergamon has been doing the installation of the k8s-rancher-app for month apparently. But i just noticed it today (for the new kafka1.staging node) [0]
puppet installs a new record and then some time later, it got deleted. Then loop over. The reason for this could be some previously used record which has not been properly cleaned up.
Let's dig for duplicate records somehow and fix it [2].
-
k8s-rancher-app: rancher.euwest.azure.internal.softwareheritage.org -cname-> k8s-rancher.euwest.azure.internal.softwareheritage.org. -
staging-kafka1: journal2.internal.staging.swh.network -cname-> kafka1.internal.staging.swh.network.
[1]
root@pergamon:~# puppet agent --test
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for pergamon.softwareheritage.org
Info: Applying configuration version '1681225236'
Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[k8s-rancher-app/CNAME]/ensure: created
Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[kafka1-stg/CNAME]/ensure: created
Notice: Applied catalog in 70.93 seconds
[2] current status without duplicates (journal2.staging -cname-> kafka1.staging is no longer resolving)
root@pergamon:~# grep -R 192.168.130.201 /var/cache/bind/*
/var/cache/bind/internal.staging.swh.network/internal.staging.swh.network:kafka1 A 192.168.130.201
root@pergamon:~# grep -R 192.168.200.19 /var/cache/bind/*
/var/cache/bind/internal.softwareheritage.org/internal.softwareheritage.org:k8s-rancher A 192.168.200.19
/var/cache/bind/internal.softwareheritage.org/internal.softwareheritage.org:rancher A 192.168.200.19
[0]
15:36 <+ardumont> pergamon: strange, i have ran `puppet agent -t` manually on puppet, and it applied a commit deployed on friday (adding a cname) (without complaining about puppet being deactivated first...)
16:01 <+ardumont> wow, it's been added then somehow removed then added back
16:01 <+ardumont> (i just applied it again)
16:01 <+ardumont> Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[k8s-rancher-app/CNAME]/ensure: created
16:01 <+ardumont> Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[kafka1-stg/CNAME]/ensure: created
17:26 <+olasd> the k8s rancher app cname has been flapping for months
17:27 <+olasd> I assume an old resource hasn't been cleaned up from the dns records
17:27 <+olasd> (the puppet dns records are only additive, so if there's a conflicting record the resource will keep flapping)
Edited by Antoine R. Dumont