Many NotFound repositories on GitHub since 2022-06-15 or 2022-06-16
@douardda noticed that https://sentry.softwareheritage.org/share/issue/99d67860c3484c7ab709154962ca8eb6/ shows a considerable increase in the number of "NotFound" repositories on GitHub, since 2022-06-15 or 2022-06-16.
This may not be an issue, but I find it surprising.
I have looked at one such origin in particular: https://sentry.softwareheritage.org/organizations/swh/issues/10253/events/47b2b8f714364acea483c7e16f3a4ffb/
The loader started visiting on "2022-06-21 07:51:02,699" (according to breadcrumbs in Sentry).
The scheduler entry for this origin is:
softwareheritage-scheduler=> select * from listed_origins where url='https://github.com/Stanley-Ezeaku/kotlin';
-[ RECORD 1 ]----------+-----------------------------------------
lister_id | 6632ef5e-322b-402b-8f28-d090f76ed6b7
url | https://github.com/Stanley-Ezeaku/kotlin
visit_type | git
extra_loader_arguments | {}
enabled | f
first_seen | 2021-06-10 02:15:29.470435+00
last_seen | 2022-06-21 07:51:03.813845+00
last_update | 2020-02-27 09:11:58+00
and the associated lister:
softwareheritage-scheduler=> select * from listers where id='6632ef5e-322b-402b-8f28-d090f76ed6b7';
-[ RECORD 1 ]-+-------------------------------------
id | 6632ef5e-322b-402b-8f28-d090f76ed6b7
name | github
instance_name | github
created | 2021-02-04 08:01:51.163997+00
current_state | {"last_seen_id": 490551028}
updated | 2022-05-10 07:23:49.246279+00
This is surprising, because according to last_seen
, the lister saw this origin 1.2s after we started loading it (or claimed to see it; this might be a lister bug).
Migrated from T4344 (view on Phabricator)