Add forge now - Process https://gitweb.torproject.org/
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Guillaume Samson changed milestone to %Extend archive coverage [Roadmap - Collect]
changed milestone to %Extend archive coverage [Roadmap - Collect]
- Guillaume Samson added AddForgeNow label
added AddForgeNow label
- Guillaume Samson assigned to @guillaume
assigned to @guillaume
- Author Owner
On staging environment:
swhscheduler@scheduler0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ > add-forge-now --preset staging \ > register-lister cgit \ > instance=gitweb.torproject.org Created 1 tasks Task 33421631 Next run: today (2023-06-19T13:24:39.120070+00:00) Interval: 1 day, 0:00:00 Type: list-cgit Policy: oneshot Args: Keyword args: enable_origins: False instance: 'gitweb.torproject.org' max_origins_per_page: 10 max_pages: 3
Check the registered lister and the listed origins:
swh-scheduler=> select * from listers where name='cgit' and instance_name='gitweb.torproject.org'; id | name | instance_name | created | current_state | updated --------------------------------------+------+-----------------------+-------------------------------+---------------+------------------------------- 6d6c36cf-acc7-4757-9007-3115c5486b30 | cgit | gitweb.torproject.org | 2023-06-19 13:25:59.142195+00 | {} | 2023-06-19 13:25:59.142195+00 (1 row) swh-scheduler=> select lister_id, url, visit_type from listed_origins where lister_id = (select id from listers where name='cgit' and instance_name='gitweb.torproject.org'); lister_id | url | visit_type --------------------------------------+----------------------------------------------------+------------ 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/anonbib.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/censorship-timeline.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/check.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/collector.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/community/outreach.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/curriculum.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/depictor.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/doctor.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/erebus.git | git 6d6c36cf-acc7-4757-9007-3115c5486b30 | https://git.torproject.org/exonerator.git | git (10 rows)
Schedule first ingests:
swhscheduler@scheduler0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ > add-forge-now --preset staging \ > schedule-first-visits \ > --type-name git \ > --lister-name cgit \ > --lister-instance-name gitweb.torproject.org 100 slots available in celery queue 10 visits to send to celery
- Guillaume Samson added 15m of time spent
added 15m of time spent
- Author Owner
On staging environment all first ingests are successfully completed:
swh-scheduler=> select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo USING(url, visit_type) where lister_id = (select id from listers where name='cgit' and instance_name='gitweb.torproject.org') and visit_type='git' group by last_visit_status; last_visit_status | count -------------------+------- successful | 10 (1 row)
- Guillaume Samson added 10m of time spent
added 10m of time spent
- Author Owner
On production environment:
swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > add-forge-now --preset production \ > register-lister cgit \ > instance=gitweb.torproject.org Created 1 tasks Task 415490567 Next run: today (2023-06-19T14:29:44.366381+00:00) Interval: 64 days, 0:00:00 Type: list-cgit Policy: recurring Args: Keyword args: instance: 'gitweb.torproject.org'
Check the registered lister and the listed origins:
softwareheritage-scheduler=> select * from listers where name='cgit' and instance_name='gitweb.torproject.org'; id | name | instance_name | created | current_state | updated --------------------------------------+------+-----------------------+-------------------------------+---------------+------------------------------- d229771c-1610-4e2c-a67b-a8d5b6f1b43c | cgit | gitweb.torproject.org | 2021-03-31 18:26:36.933395+00 | {} | 2021-03-31 18:26:36.933395+00 (1 row) softwareheritage-scheduler=> select lister_id, url, visit_type from listed_origins where lister_id = (select id from listers where name='cgit' and instance_name='gitweb.torproject.org'); lister_id | url | visit_type --------------------------------------+----------------------------------------------------------------------------+------------ d229771c-1610-4e2c-a67b-a8d5b6f1b43c | https://git.torproject.org/admin/dns/auto-dns.git | git [...] d229771c-1610-4e2c-a67b-a8d5b6f1b43c | https://gitweb.torproject.org/webstats.git | git (991 rows)
Schedule first ingests:
swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > add-forge-now --preset production \ > schedule-first-visits \ > --type-name git \ > --lister-name cgit \ > --lister-instance-name gitweb.torproject.org 10000 slots available in celery queue 515 visits to send to celery
- Guillaume Samson added 15m of time spent
added 15m of time spent
- Author Owner
On production environment first ingests are completed with failures:
softwareheritage-scheduler=> select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo USING(url, visit_type) where lister_id = (select id from listers where name='cgit' and instance_name='gitweb.torproject.org') and visit_type='git' group by last_visit_status; last_visit_status | count -------------------+------- successful | 471 not_found | 520 (2 rows)
All repositories have two urls (git.torproject.org and gitweb.torproject.org):
softwareheritage-scheduler=> select count(url) from listed_origins where lister_id = (select id from listers where name='cgit' and instance_name='gitweb.torproject.org') and url like 'https://git.torproject.org%'; count ------- 471 (1 row) softwareheritage-scheduler=> select count(url) from listed_origins where lister_id = (select id from listers where name='cgit' and instance_name='gitweb.torproject.org') and url like 'https://gitweb.torproject.org%'; count ------- 520 (1 row)
- Guillaume Samson added 25m of time spent
added 25m of time spent
- Author Owner
Update task with the "base_git_url" option:
swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > add-forge-now --preset production \ > register-lister cgit \ > instance=gitweb.torproject.org \ > base_git_url=https://git.torproject.org Created 1 tasks Task 415520228 Next run: today (2023-06-26T13:12:31.747541+00:00) Interval: 64 days, 0:00:00 Type: list-cgit Policy: recurring Args: Keyword args: base_git_url: 'https://git.torproject.org' instance: 'gitweb.torproject.org'
Then disable old task:
softwareheritage-scheduler=> select * from task where type = 'list-cgit' and arguments -> 'kwargs' ->> 'instance' = 'gitweb.torproject.org'; -[ RECORD 1 ]----+--------------------------------------------------------------------------------------------------------------------------------------------------------- id | 415520228 type | list-cgit arguments | {"args": [], "kwargs": {"instance": "gitweb.torproject.org", "base_git_url": "https://git.torproject.org"}} next_run | 2023-08-29 13:12:38.998666+00 current_interval | 64 days status | next_run_not_scheduled policy | recurring retries_left | 3 priority | -[ RECORD 2 ]----+--------------------------------------------------------------------------------------------------------------------------------------------------------- id | 415490567 type | list-cgit arguments | {"args": [], "kwargs": {"instance": "gitweb.torproject.org"}} next_run | 2023-08-22 14:30:24.125187+00 current_interval | 64 days status | disabled policy | recurring retries_left | 3 priority | -[ RECORD 3 ]----+--------------------------------------------------------------------------------------------------------------------------------------------------------- id | 168393646 type | list-cgit arguments | {"args": [], "kwargs": {"url": "https://gitweb.torproject.org/", "instance": "gitweb.torproject.org", "base_git_url": "https://gitweb.torproject.org/"}} next_run | 2023-07-11 18:30:21.330725+00 current_interval | 64 days status | disabled policy | recurring retries_left | 3 priority |
- Guillaume Samson added 20m of time spent
added 20m of time spent
- Guillaume Samson closed
closed
- anarcat mentioned in issue swh/devel/swh-web#4787
mentioned in issue swh/devel/swh-web#4787
Please register or sign in to reply