Investigate & monitor lister scheduling issue
From swh-devel mailing list [1], the loader is apparently stuck.
An a quick check in the scheduler, it's the case indeed [2].
That's the loader which will be deprecated by the new nixguix stack (lister & multiple loaders). In the mean time, let's unstuck it. When that will happen, the loading tasks generated won't be stuck like this one (this uses the old scheduling pattern which is subject to this kind of stuckness issue). (tbf the new listing task one might but, we'll see).
Plan:
-
Unstuck -
Analyze (reproduced through deployment in dynamic infra, not sure it's the sole way though) -
Monitor for a while (it's still running since deployment fix has been pushed) -
Loader nixguix has been deprecated in favor of the new stack but the issue is still relevant for listers -
swh/infra/puppet/puppet-swh-site!693 (closed): Add alerts when listers are stuck (first iteration [3])
[3] swh/infra/ci-cd/swh-charts!341 (closed)
[2]
+-[ RECORD 2 ]-----+------------------------------------------------------------------------------------------------------+
| id | 334411727 |
| type | load-nixguix |
| arguments | {"args": [], "kwargs": {"url": "https://nix-community.github.io/nixpkgs-swh/sources-unstable.json"}} |
| next_run | 2023-09-21 15:24:51.214023+00 |
| current_interval | 1 day |
| status | next_run_scheduled |
| policy | recurring |
| retries_left | 3 |
| priority | (null) |
+-[ RECORD 3 ]-----+------------------------------------------------------------------------------------------------------+
| id | 337282717 |
| type | load-nixguix |
| arguments | {"args": [], "kwargs": {"url": "https://guix.gnu.org/sources.json"}} |
| next_run | 2023-09-21 15:24:51.214023+00 |
| current_interval | 1 day |
| status | next_run_scheduled |
| policy | recurring |
| retries_left | 3 |
| priority | (null) |
+------------------+------------------------------------------------------------------------------------------------------+
[1]
I was investigating SWH coverage in Guix, and I noticed it is getting
much worse recently.
Looking at the visits it seems it hasn’t made a visit to either the Guix
or Nix “sources.json” since September:
https://archive.softwareheritage.org/browse/origin/visits/?origin_url=https://guix.gnu.org/sources.json
https://archive.softwareheritage.org/browse/origin/visits/?origin_url=https://nix-community.github.io/nixpkgs-swh/sources-unstable.json
This happened two years ago, too:
https://forge.softwareheritage.org/T3763.
Hence, the question in the subject. Is it stuck again? Thanks for
taking a look!
Related to swh/devel/swh-loader-core#3763 (closed)
Edited by Antoine R. Dumont