swh-scheduler=> select * from listers where name ='cgit' order by created desc limit 1; id | name | instance_name | created | current_state | updated--------------------------------------+------+-------------------+-------------------------------+---------------+------------------------------- fe9b09be-0953-4541-985c-2c010dda0353 | cgit | gitweb.gentoo.org | 2022-10-21 16:17:51.361901+00 | {} | 2022-10-21 16:17:51.361901+00(1 row)Time: 6.839 msswh-scheduler=> update listed_origins set enabled=false where lister_id='fe9b09be-0953-4541-985c-2c010dda0353';UPDATE 597Time: 252.394 ms
Scheduling a subset of the repos
swhscheduler@scheduler0:~$ ./gitweb-gentoo-org.shFri Oct 21 16:29:32 UTC 2022 scheduling git origins with policy never_visited_oldest_update_first to queue add_forge_now:swh.loader.git.tasks.UpdateGitRepository for lister gitweb.gentoo.org (tablesample 1)100 slots available in celery queue100 visits to send to celeryFri Oct 21 16:29:35 UTC 2022 sleep 60Fri Oct 21 16:29:35 UTC 2022 scheduling git origins with policy origins_without_last_update to queue add_forge_now:swh.loader.git.tasks.UpdateGitRepository for lister gitweb.gentoo.org (tablesample 1)100 slots available in celery queue50 visits to send to celeryFri Oct 21 16:29:38 UTC 2022 sleep 60^C
swh/loader-addforgenow-6947dbdf98-x25kf[loaders]: [2022-10-21 16:30:28,158: INFO/ForkPoolWorker-1] Load origin 'https://anongit.gentoo.org/git/user/yoreek.git' with type 'git'swh/loader-addforgenow-6947dbdf98-x25kf[loaders]: Enumerating objects: 291, done.swh/loader-addforgenow-6947dbdf98-x25kf[loaders]: Total 291 (delta 0), reused 0 (delta 0), pack-reused 291swh/loader-addforgenow-6947dbdf98-x25kf[loaders]: [2022-10-21 16:30:28,490: INFO/ForkPoolWorker-1] Listed 2 refs for repo https://anongit.gentoo.org/git/user/yoreek.gitswh/loader-addforgenow-6947dbdf98-nzxq7[loaders]: [2022-10-21 16:30:31,405: INFO/ForkPoolWorker-1] Fetched 38 objects; 37 are newswh/loader-addforgenow-6947dbdf98-nzxq7[loaders]: [2022-10-21 16:30:31,601: INFO/ForkPoolWorker-1] Task swh.loader.git.tasks.UpdateGitRepository[666d2326-e2a6-4757-ad0a-7b7f8aed31d1] succeeded in 8.655134974978864s: {'status': 'eventful'}
rabbitmq addforgenow queue purged to avoid loading all the repositories
softwareheritage-scheduler=> select * from listers where name ='cgit' order by created desc limit 1; id | name | instance_name | created | current_state | updated--------------------------------------+------+-------------------+-------------------------------+---------------+------------------------------- 8d1ebad5-a2dc-482c-9b9b-188665274377 | cgit | gitweb.gentoo.org | 2022-10-21 16:44:50.928155+00 | {} | 2022-10-21 16:44:50.928155+00(1 row)Time: 8.680 mssoftwareheritage-scheduler=> select count(*) from listed_origins where lister_id='8d1ebad5-a2dc-482c-9b9b-188665274377'; count------- 598(1 row)Time: 21.903 ms
Manually schedule the loadings
swhscheduler@saatchi:~/addforgenow$ ./gitweb-gentoo-org.shFri Oct 21 16:49:09 UTC 2022 scheduling git origins with policy never_visited_oldest_update_first to queue add_forge_now:swh.loader.git.tasks.UpdateGitRepository for lister gitweb.gentoo.org (tablesample 1)10000 slots available in celery queue548 visits to send to celeryFri Oct 21 16:49:10 UTC 2022 sleep 60Fri Oct 21 16:49:10 UTC 2022 scheduling git origins with policy origins_without_last_update to queue add_forge_now:swh.loader.git.tasks.UpdateGitRepository for lister gitweb.gentoo.org (tablesample 1)10000 slots available in celery queue49 visits to send to celeryFri Oct 21 16:49:12 UTC 2022 sleep 60
softwareheritage-scheduler=> select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo on lo.url = ovs.url and lo.visit_type = ovs.visit_type where lister_id='8d1ebad5-a2dc-482c-9b9b-188665274377' group by last_visit_status; last_visit_status | count-------------------+------- successful | 596 failed | 1 | 1(3 rows)Time: 32.196 ms
softwareheritage-scheduler=> select ovs.url, last_visit_status from origin_visit_stats ovs join listed_origins lo on lo.url = ovs.url and lo.visit_type = ovs.visit_type where lister_id='8d1ebad5-a2dc-482c-9b9b-188665274377' and (last_visit_status is null or last_visit_status = 'failed'); url | last_visit_status--------------------------------------------------------+------------------- https://anongit.gentoo.org/git/archive/proj/gentoo.git | https://anongit.gentoo.org/git/report/gentoo-ci.git | failed(2 rows)Time: 118.680 ms
The failing repository is a repository that seems to not fail even after several attempts"
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://anongit.gentoo.org/git/report/gentoo-ci.git' with type 'git'ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` statusTraceback (most recent call last): File "/usr/local/lib/python3.10/http/client.py", line 565, in _get_chunk_left chunk_left = self._read_next_chunk_size() File "/usr/local/lib/python3.10/http/client.py", line 532, in _read_next_chunk_size return int(line, 16)ValueError: invalid literal for int() with base 16: b''During handling of the above exception, another exception occurred:
softwareheritage-scheduler=> select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo on lo.url = ovs.url and lo.visit_type = ovs.visit_type where lister_id='8d1ebad5-a2dc-482c-9b9b-188665274377' group by last_visit_status; last_visit_status | count-------------------+------- successful | 597 failed | 1(2 rows)Time: 61.814 ms