Process ingestion of the cgit instance https://source.codeaurora.org
In priority the $URL/quic origins.
From a ping from @vlorentz:
Note: Qualcomm Innovation Center Inc. maintained repositories have migrated to git.codelinaro.org. QUIC repositories on this site will not receive any updates after March 31, 2022, and will be deleted on March 31, 2023. If your project depends on these repositories, please adjust your tooling configuration to use the new, up-to-date project location.
-
staging:
- Trigger listing through add-forge-now cli -> not working, issue in the scheduler -> fix -> package, deploy
- Trigger listing again -> not working, issue in the cgit task -> fix -> package, deploy
- Trigger listing again -> suspicious hanging -> issue in the lister pattern base class regarding consumption -> fix from anlambert -> package, deploy
- Trigger listing again -> ok
- Trigger few ingestion -> ok
-
production:
- Trigger listing through add-forge-now cli -> ok
- Prepare listing for the full forge next week -> ok
- Trigger ingestion -> ongoing
- Trigger full listing of the forge (now that the main urgent origins have been ingested)
- Trigger ingestion of the remaining origins
- Actual ingestion
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@a2771874
mentioned in commit ardumont/swh-scheduler@a2771874
- Antoine R. Dumont mentioned in merge request swh/devel/swh-scheduler!341 (merged)
mentioned in merge request swh/devel/swh-scheduler!341 (merged)
- Antoine R. Dumont changed title from Ingest https://source.codeaurora.org/quic/ to Process ingestion of https://source.codeaurora.org
changed title from Ingest https://source.codeaurora.org/quic/ to Process ingestion of https://source.codeaurora.org
- Author Owner
- Trigger registration of the listing in staging [1]
[1]
swhscheduler@scheduler0:~$ swh scheduler --config-file $CONFIG \ > add-forge-now --preset staging \ > register-lister $LISTER_NAME \ > url=$FORGE_URL Created 1 tasks Task 33420816 Next run: tomorrow (2023-03-22T10:59:58.595818+00:00) Interval: 1 day, 0:00:00 Type: list-cgit Policy: oneshot Args: Keyword args: enable_origins: False max_origins_per_page: 10 max_pages: 3 url: 'https://source.codeaurora.org/'
Edited by Antoine R. Dumont - Author Owner
The listing failed because the task is not implemented the correct way... (on it) [1]
[1]
listers [2023-03-21 11:13:53,282: ERROR/ForkPoolWorker-5] Task swh.lister.cgit.tasks.CGitListerTask[fe5b9920-6082-4cdd-8b24-a598a580ff35] raised unexpected: TypeError("list_cgit() got an unexpected keyword argument 'max_pages'") listers Traceback (most recent call last): listers File "/opt/swh/.local/lib/python3.10/site-packages/celery/app/trace.py", line 451, in trace_task listers R = retval = fun(*args, **kwargs) listers File "/opt/swh/.local/lib/python3.10/site-packages/sentry_sdk/integrations/celery.py", line 207, in _inner listers reraise(*exc_info) listers File "/opt/swh/.local/lib/python3.10/site-packages/sentry_sdk/_compat.py", line 60, in reraise listers raise value listers File "/opt/swh/.local/lib/python3.10/site-packages/sentry_sdk/integrations/celery.py", line 202, in _inner listers return f(*args, **kwargs) listers File "/opt/swh/.local/lib/python3.10/site-packages/swh/scheduler/task.py", line 61, in __call__ listers result = super().__call__(*args, **kwargs) listers File "/opt/swh/.local/lib/python3.10/site-packages/celery/app/trace.py", line 734, in __protected_call__ listers return self.run(*args, **kwargs) listers TypeError: list_cgit() got an unexpected keyword argument 'max_pages'
- Antoine R. Dumont mentioned in commit ardumont/swh-lister@45bbc29a
mentioned in commit ardumont/swh-lister@45bbc29a
- Antoine R. Dumont mentioned in merge request swh/devel/swh-lister!462 (merged)
mentioned in merge request swh/devel/swh-lister!462 (merged)
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@5936ae13
mentioned in commit ardumont/swh-scheduler@5936ae13
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@12c40b9b
mentioned in commit swh/infra/ci-cd/swh-charts@12c40b9b
- Author Owner
Fixed ^
- Author Owner
- Packaged and deployed the fixes ^
- Cleaned up sentry issues
- Trigger back scheduling on staging the check it's working ok [1]
- It did not break this time
[1]
2023-03-21 15:47:34 swh-scheduler@db1:5432 λ update task set next_run=now(), status='next_run_not_scheduled' where id=33420816; UPDATE 1 Time: 314.894 ms 2023-03-21 15:47:46 swh-scheduler@db1:5432 λ select * from task where type='list-cgit' and id=33420816; +-[ RECORD 1 ]-----+----------------------------------------------------------------------------------------------------------------------------------------+ | id | 33420816 | | type | list-cgit | | arguments | {"args": [], "kwargs": {"url": "https://source.codeaurora.org/", "max_pages": 3, "enable_origins": false, "max_origins_per_page": 10}} | | next_run | 2023-03-21 14:47:45.901373+00 | | current_interval | 1 day | | status | next_run_not_scheduled | | policy | oneshot | | retries_left | 0 | | priority | (null) | +------------------+----------------------------------------------------------------------------------------------------------------------------------------+ Time: 5.356 ms 2023-03-21 15:47:47 swh-scheduler@db1:5432 λ select * from task where type='list-cgit' and id=33420816; +-[ RECORD 1 ]-----+----------------------------------------------------------------------------------------------------------------------------------------+ | id | 33420816 | | type | list-cgit | | arguments | {"args": [], "kwargs": {"url": "https://source.codeaurora.org/", "max_pages": 3, "enable_origins": false, "max_origins_per_page": 10}} | | next_run | 2023-03-21 14:47:45.901373+00 | | current_interval | 1 day | | status | next_run_scheduled | | policy | oneshot | | retries_left | 0 | | priority | (null) | +------------------+----------------------------------------------------------------------------------------------------------------------------------------+ Time: 4.143 ms
[2]
│ listers [2023-03-21 14:48:52,462: INFO/MainProcess] Task swh.lister.cgit.tasks.CGitListerTask[d584813e-6220-475b-b2f9-e1a13ef586f8] received │ │ listers [2023-03-21 14:48:52,463: INFO/MainProcess] lister@lister-all-575dc5594b-n777k ready.
Edited by Antoine R. Dumont - Author Owner
That's apparently not doing much, laziness... (generator) and the cgit implementation does not work well on listing a subset of those forges. [1]
@anlambert is working on a fix.
[1]
2023-03-21 16:09:38 swh-scheduler@db1:5432 λ select visit_type, count(*) from listed_origins where lister_id=(select id from listers where name='cgit' and instance_name='source.codeaurora.org') group by visit_type; +------------+-------+ | visit_type | count | +------------+-------+ +------------+-------+ (0 rows) Time: 1284.922 ms (00:01.285)
Edited by Antoine R. Dumont - Author Owner
I've stopped the current listing (and purged the rabbitmq queue holding that message).
Edited by Antoine R. Dumont - Antoine R. Dumont mentioned in merge request swh/devel/swh-lister!463 (closed)
mentioned in merge request swh/devel/swh-lister!463 (closed)
- Author Owner
The fix got merged and tagged by @anlambert (thanks). Lather, rinse, repeat the deployment dance with swh.lister v5.1.0 (when jenkins is done publishing the new version).
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@d1f90097
mentioned in commit swh/infra/ci-cd/swh-charts@d1f90097
- Author Owner
Deployed on staging and processed in 10.5s now, thanks again:
listers [2023-03-21 16:53:12,392: INFO/MainProcess] lister@lister-all-585787d8cc-vjmpr ready. listers [2023-03-21 16:53:22,775: INFO/ForkPoolWorker-1] Max origins per page set to 10 and reached, aborting page processing listers [2023-03-21 16:53:22,775: INFO/ForkPoolWorker-1] Disabling origins before sending them to the scheduler listers [2023-03-21 16:53:23,086: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[3f21ef89-0fec-4e4a-b121-20d4c360026f] succeeded in 10.57704611076042s: {'pa
- Author Owner
staging went fine.
swhscheduler@scheduler0:~$ swh scheduler --config-file $CONFIG \ > add-forge-now --preset staging \ > schedule-first-visits \ > --type-name git \ > --lister-name $LISTER_NAME \ > --lister-instance-name $LISTER_INSTANCE_NAME 100 slots available in celery queue 10 visits to send to celery
- Author Owner
I did the deployment of the fixes in staging. As the origins in perils are the /quic repositories urls (the 31/03 they'll be shut down), i've scheduled those first but i'm not sure that will work [1]
We'll see.
[1] I've changed the policy to oneshot afterwards.
swhscheduler@saatchi:~$ CONFIG=/etc/softwareheritage/scheduler/backend.yml swhscheduler@saatchi:~$ LISTER_NAME=cgit swhscheduler@saatchi:~$ LISTER_INSTANCE_NAME=source.codeaurora.org/quic swhscheduler@saatchi:~$ FORGE_URL=https://$LISTER_INSTANCE_NAME/ swhscheduler@saatchi:~$ swh scheduler --config-file $CONFIG \ > add-forge-now --preset production \ > register-lister $LISTER_NAME \ > url=$FORGE_URL Created 1 tasks Task 415322054 Next run: tomorrow (2023-03-22T17:01:14.701022+00:00) Interval: 64 days, 0:00:00 Type: list-cgit Policy: recurring Args: Keyword args: url: 'https://source.codeaurora.org/quic/'
- Author Owner
Current scheduled forge:
2023-03-21 18:05:18 softwareheritage-scheduler@belvedere:5432 λ select * from task where type='list-cgit' and id in (415322054, 415322055); +-[ RECORD 1 ]-----+------------------------------------------------------------------------+ | id | 415322054 | | type | list-cgit | | arguments | {"args": [], "kwargs": {"url": "https://source.codeaurora.org/quic/"}} | | next_run | 2023-03-21 17:05:14.442607+00 | | current_interval | 64 days | | status | next_run_scheduled | | policy | oneshot | | retries_left | 3 | | priority | (null) | +-[ RECORD 2 ]-----+------------------------------------------------------------------------+ | id | 415322055 | | type | list-cgit | | arguments | {"args": [], "kwargs": {"url": "https://source.codeaurora.org/"}} | | next_run | 2023-03-22 17:03:06.470383+00 | | current_interval | 64 days | | status | next_run_not_scheduled | | policy | recurring | | retries_left | 3 | | priority | (null) | +------------------+------------------------------------------------------------------------+ Time: 5.498 ms