Project 'infra/sysadm-environment' was moved to 'swh/infra/sysadm-environment'. Please update any links and bookmarks that may still have the old path.
Deploy swh-indexer > v2.6 on staging then production
Workers refuse to upgrade to the actual 2.4.3 version [1]. I did not realize my previous
upgrade from yesterday stopped at the v2.3.0.
It seems related to the new dependency version constraint on rdflib introduced recently
[2]. That version is not available on indexer workers [3].
[1]
root@indexer-worker02:~# apt-get upgradeReading package lists... DoneBuilding dependency treeReading state information... DoneCalculating upgrade... DoneThe following packages have been kept back: python3-swh.indexer python3-swh.indexer.storage0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
Drop the debian constraint on python3-rdflib. Trigger a rebuild and upgraded the package
again. Added the unconditional dependency on python3-rdflib-jsonld dependency (which on
latest debian release is not useful but without being a blocker).
@vlorentz fixed another error directly within the model to deal with old versioned objects out of the model.
This meant a new release for swh.model and swh.indexer.
Unfortunately, now the indexer debian build is broken due to the objstorage debian build being broken...
objstorage build unstuck [1]
Triggered back the build for indexer.
@vlorentz fixed another error directly within the model to deal with old versioned objects out of the model.
This meant a new release for swh.model and swh.indexer.
Unfortunately, now the indexer debian build is broken due to the objstorage debian build being broken...
objstorage build unstuck [1]
Triggered back the build for indexer.
Antoine R. Dumontmarked the checklist item Reset journal client on the swh.journal.objects.raw_extrinsic_metadata topic (for the new SWORD metadata mapping) as completed
marked the checklist item Reset journal client on the swh.journal.objects.raw_extrinsic_metadata topic (for the new SWORD metadata mapping) as completed
Antoine R. Dumontmarked the checklist item Reset journal client on the swh.journal.objects.origin_visit_status topic (for the new Nuget metadata mapping by @VickyMerzOwn) as completed
marked the checklist item Reset journal client on the swh.journal.objects.origin_visit_status topic (for the new Nuget metadata mapping by @VickyMerzOwn) as completed
There's a few issues with the configuration of these indexer clients:
the traffic should not be going through the IPSec VPN. They need to use the public, authenticated kafka endpoints. The IPSec load is making all azure communication struggle.
It seems that there are some old services on the azure hosts that have not been disabled and are consistently restarting with a missing configuration file.
There is also a bunch of services that are trying to schedule tasks on the scheduler backend (and failing, because that's firewalled).
@vsellier has stopped everything to avoid getting spammed by traffic issues all night, until someone can properly investigate.
I'm guessing that's the extrinsic metadata indexer; others need to do plenty of random access to the storage, but that one consumes very quickly from Kafka. On the bright side, it consumes the entire topic within hours so parallelism could be reduced, as a quick fix
There's a few issues with the configuration of these indexer clients:
the traffic should not be going through the IPSec VPN. They need to use the public, authenticated kafka endpoints. The IPSec load is making all azure communication struggle.
It seems that there are some old services on the azure hosts that have not been disabled and are consistently restarting with a missing configuration file.
That's surprising as those are new nodes...
One thing i can think of would be to use the wrong clush command and starting for all indexer nodes some services (even those not installed...).
There is also a bunch of services that are trying to schedule tasks on the scheduler backend (and failing, because that's firewalled).
That must be a side effect of the previous points as the "new" indexer journal client services no longer do that.
In any case, thanks for the heads up, i'll investigate and clean up when i'll have a go ahead from @vsellier.