Add forge now - Process https://git.trueelena.org/
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Guillaume Samson changed milestone to %Extend archive coverage [Roadmap - Collect]
changed milestone to %Extend archive coverage [Roadmap - Collect]
- Guillaume Samson added AddForgeNow label
added AddForgeNow label
- Guillaume Samson assigned to @guillaume
assigned to @guillaume
- Author Owner
On staging environment:
swhscheduler@scheduler0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ > add-forge-now --preset staging \ > register-lister cgit \ > url=https://git.trueelena.org/ Created 1 tasks Task 33421021 Next run: tomorrow (2023-04-13T08:21:45.989722+00:00) Interval: 1 day, 0:00:00 Type: list-cgit Policy: oneshot Args: Keyword args: enable_origins: False max_origins_per_page: 10 max_pages: 3 url: 'https://git.trueelena.org/'
swh-scheduler=> select * from task where id=33421021; -[ RECORD 1 ]----+----------------------------------------------------------------------------------------------------------------------------------- id | 33421021 type | list-cgit arguments | {"args": [], "kwargs": {"url": "https://git.trueelena.org/", "max_pages": 3, "enable_origins": false, "max_origins_per_page": 10}} next_run | 2023-04-13 08:21:45.989722+00 current_interval | 1 day status | next_run_not_scheduled policy | oneshot retries_left | 0 priority | swh-scheduler=> update task set next_run=now(), status='next_run_not_scheduled' where id=33421021; UPDATE 1
swh-scheduler=> select * from listers where name='cgit' and instance_name='git.trueelena.org'; id | name | instance_name | created | current_state | updated --------------------------------------+------+-------------------+-------------------------------+---------------+------------------------------- b56e5f8c-c49a-41a8-9389-c3b3e79493f1 | cgit | git.trueelena.org | 2023-04-12 09:38:50.407998+00 | {} | 2023-04-12 09:38:50.407998+00 (1 row)
swhscheduler@scheduler0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ > add-forge-now --preset staging \ > schedule-first-visits \ > --type-name git \ > --lister-name cgit \ > --lister-instance-name git.trueelena.org 100 slots available in celery queue 10 visits to send to celery
Edited by Guillaume Samson - Guillaume Samson added 10m of time spent
added 10m of time spent
- Guillaume Samson added 30m of time spent
added 30m of time spent
- Author Owner
On staging environment, all first ingests failed:
swh-scheduler=> select visit_type, url, last_visit_status from origin_visit_stats where visit_type='git' and url like 'https://git.trueelena.org%'; visit_type | url | last_visit_status ------------+-------------------------------------------------------------------+------------------- git | https://git.trueelena.org/cgit.cgi/3d/craft_tools | not_found git | https://git.trueelena.org/cgit.cgi/3d/tubular_structures | not_found git | https://git.trueelena.org/cgit.cgi/3d/dice | not_found git | https://git.trueelena.org/cgit.cgi/3d/piecepack | not_found git | https://git.trueelena.org/cgit.cgi/bookmarks/brick_mortar_and_web | not_found git | https://git.trueelena.org/cgit.cgi/bookmarks/gl-como | not_found git | https://git.trueelena.org/cgit.cgi/crafts/bdfsm | not_found git | https://git.trueelena.org/cgit.cgi/crafts/clipart | not_found git | https://git.trueelena.org/cgit.cgi/crafts/fiber_patterns | not_found git | https://git.trueelena.org/cgit.cgi/crafts/origami | not_found (10 rows)
swh-scheduler=> select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo USING(url, visit_type) where lister_id = (select id from listers where name='cgit' and instance_name='git.trueelena.org') and visit_type='git' group by last_visit_status; last_visit_status | count -------------------+------- not_found | 10 (1 row)
- Guillaume Samson added 20m of time spent
added 20m of time spent
- Maintainer
The HTTPS clone URL that can be found in each repository page is not valid.
Fortunately, we already handled that kind of edge case through the use of the optional
base_git_url
parameter of the cgit lister.Use that scheduling command and listed origin URLs will be valid.
$ swh scheduler add-forge-now --preset staging register-lister cgit\ url=https://git.trueelena.org base_git_url=https://git.trueelena.org Created 1 tasks Task 5 Next run: tomorrow (2023-04-13T13:21:14.827954+00:00) Interval: 1 day, 0:00:00 Type: list-cgit Policy: oneshot Args: Keyword args: base_git_url: 'https://git.trueelena.org' enable_origins: False max_origins_per_page: 10 max_pages: 3 url: 'https://git.trueelena.org'
Collapse replies - Author Owner
Ok thanks. I try again with this parameter.
- Owner
oh, right, i totally missed that yesterday in our conversation and did not realize this was related to cgit's base git url which was not the right one.
Thanks @anlambert for reminding me!
- Author Owner
On staging environment:
swhscheduler@scheduler0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ > add-forge-now --preset staging \ > register-lister cgit \ > url=https://git.trueelena.org/ \ > base_git_url=https://git.trueelena.org/ Created 1 tasks Task 33421026 Next run: tomorrow (2023-04-13T14:41:18.381075+00:00) Interval: 1 day, 0:00:00 Type: list-cgit Policy: oneshot Args: Keyword args: base_git_url: 'https://git.trueelena.org/' enable_origins: False max_origins_per_page: 10 max_pages: 3 url: 'https://git.trueelena.org/'
swhscheduler@scheduler0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ > add-forge-now --preset staging \ > schedule-first-visits \ > --type-name git \ > --lister-name cgit \ > --lister-instance-name git.trueelena.org 100 slots available in celery queue 10 visits to send to celery
Edited by Guillaume Samson - Author Owner
On staging environment with the parameter
base_git_url
, all ingests are successfully completed:swh-scheduler=> select visit_type, url, last_visit_status from origin_visit_stats where url like 'https://git.trueelena.org%' and not url like 'https://git.trueelena.org/cgit.cgi/%'; visit_type | url | last_visit_status ------------+----------------------------------------------------------+------------------- git | https://git.trueelena.org/3d/craft_tools | successful git | https://git.trueelena.org/3d/tubular_structures | successful git | https://git.trueelena.org/3d/dice | successful git | https://git.trueelena.org/3d/piecepack | successful git | https://git.trueelena.org/crafts/bdfsm | successful git | https://git.trueelena.org/bookmarks/brick_mortar_and_web | successful git | https://git.trueelena.org/crafts/clipart | successful git | https://git.trueelena.org/crafts/origami | successful git | https://git.trueelena.org/crafts/fiber_patterns | successful git | https://git.trueelena.org/bookmarks/gl-como | successful (10 rows) swh-scheduler=> select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo USING(url, visit_type) where lister_id = (select id from listers where name='cgit' and instance_name='git.trueelena.org') and visit_type='git' group by last_visit_status; last_visit_status | count -------------------+------- successful | 10 (1 row)
- Guillaume Samson added 30m of time spent
added 30m of time spent
- Author Owner
On production environment:
swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > add-forge-now --preset production \ > register-lister cgit \ > url==https://git.trueelena.org/ \ > base_git_url=https://git.trueelena.org/ Created 1 tasks Task 415367387 Next run: tomorrow (2023-04-13T15:25:04.576358+00:00) Interval: 64 days, 0:00:00 Type: list-cgit Policy: recurring Args: Keyword args: base_git_url: 'https://git.trueelena.org/' url: '=https://git.trueelena.org/'
- Guillaume Samson added 5m of time spent
added 5m of time spent
- Author Owner
My bad:
url==https://git.trueelena.org/
softwareheritage-scheduler=> select * from task where id=415367387; id | type | arguments | next_run | current_interval | status | policy | retries_left | priority -----------+-----------+--------------------------------------------------------------------------------------------------------------+-------------------------------+------------------+------------------------+-----------+--------------+---------- 415367387 | list-cgit | {"args": [], "kwargs": {"url": "=https://git.trueelena.org/", "base_git_url": "https://git.trueelena.org/"}} | 2023-04-13 10:11:50.981417+00 | 64 days | next_run_not_scheduled | recurring | 1 | (1 row) softwareheritage-scheduler=> begin; update task set status='disabled' where id=415367387; BEGIN UPDATE 1 softwareheritage-scheduler=*> commit; COMMIT softwareheritage-scheduler=> select * from task where id=415367387; id | type | arguments | next_run | current_interval | status | policy | retries_left | priority -----------+-----------+--------------------------------------------------------------------------------------------------------------+-------------------------------+------------------+----------+-----------+--------------+---------- 415367387 | list-cgit | {"args": [], "kwargs": {"url": "=https://git.trueelena.org/", "base_git_url": "https://git.trueelena.org/"}} | 2023-04-13 10:11:50.981417+00 | 64 days | disabled | recurring | 1 | (1 row)
Collapse replies - Owner
heh, that happens.
and jsyk sentry (and valentin) saw that ;) I've "resolved" the issue in sentry.
[1] https://sentry.softwareheritage.org/organizations/swh/issues/106582/activity/?project=6
Edited by Antoine R. Dumont - Author Owner
Thanks.
- Guillaume Samson added 15m of time spent
added 15m of time spent
- Author Owner
On production environment:
swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > add-forge-now --preset production \ > register-lister cgit \ > url=https://git.trueelena.org/ \ > base_git_url=https://git.trueelena.org/ Created 1 tasks Task 415367643 Next run: tomorrow (2023-04-14T12:10:55.819593+00:00) Interval: 64 days, 0:00:00 Type: list-cgit Policy: recurring Args: Keyword args: base_git_url: 'https://git.trueelena.org/' url: 'https://git.trueelena.org/'
softwareheritage-scheduler=> begin; update task set next_run=now(), status='next_run_not_scheduled' where id=415367643; BEGIN UPDATE 1 softwareheritage-scheduler=*> commit; COMMIT
swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ > add-forge-now --preset production \ > schedule-first-visits \ > --type-name git \ > --lister-name cgit \ > --lister-instance-name git.trueelena.org 10000 slots available in celery queue 35 visits to send to celery
- Guillaume Samson added 20m of time spent
added 20m of time spent
- Author Owner
All first ingests are successfully completed:
softwareheritage-scheduler=> select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo USING(url, visit_type) where lister_id = (select id from listers where name='cgit' and instance_name='git.trueelena.org') and visit_type='git' group by last_visit_status; last_visit_status | count -------------------+------- successful | 35 (1 row)
- Guillaume Samson added 15m of time spent
added 15m of time spent
- Guillaume Samson closed
closed