13:52:08 swh-scheduler@db1:5432=> select now(), visit_type, count(*) from listed_origins where lister_id = ( select id from listers where name='gogs') group by visit_type;+-------------------------------+------------+-------+| now | visit_type | count |+-------------------------------+------------+-------+| 2022-09-21 11:52:14.859315+00 | git | 160 |+-------------------------------+------------+-------+(1 row)Time: 81.638 ms
It appeared to not be doing much [0]. Although, T4533 got raised and [1] went down a bit earlier.
I've stopped the lister in the mean time.
[0]
14:18:12 swh-scheduler@db1:5432=> select now(), visit_type, count(*) from listed_origins where lister_id = (select id from listers where name='gogs') group by visit_type;+-----+------------+-------+| now | visit_type | count |+-----+------------+-------++-----+------------+-------+(0 rows)Time: 78.367 ms14:18:20 swh-scheduler@db1:5432=> select now(), * from listers where name='gogs';+-------------------------------+--------------------------------------+------+---------------+-------------------------------+---------------+-------------------------------+| now | id | name | instance_name | created | current_state | updated |+-------------------------------+--------------------------------------+------+---------------+-------------------------------+---------------+-------------------------------+| 2022-09-13 12:18:29.230675+00 | ec6c8408-42f9-47b8-b045-5a812d4a0eff | gogs | try.gogs.io | 2022-09-13 09:13:00.647726+00 | {} | 2022-09-13 09:13:00.647726+00 |+-------------------------------+--------------------------------------+------+---------------+-------------------------------+---------------+-------------------------------+(1 row)Time: 4.721 ms
try.gogs.io being down must have been a coincidence last week.
I triggered back a listing (debug mode) and it just gets stuck behind the issue (in the task mentioned above).
[2022-09-20 09:08:32,414: INFO/MainProcess] Task swh.lister.gogs.tasks.FullGogsRelister[740fe66e-1b02-43ce-a696-0f6f22272617] received[2022-09-20 09:08:32,542: DEBUG/ForkPoolWorker-1] Loading config file /etc/swh/config.yml[2022-09-20 09:08:32,819: INFO/ForkPoolWorker-1] Using authentication credentials from user swhbot[2022-09-20 09:08:32,826: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search with params {'limit': 50, 'page': 1}[2022-09-20 09:08:34,404: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=2 with params {}[2022-09-20 09:08:34,733: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=3 with params {}[2022-09-20 09:08:35,115: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=4 with params {}[2022-09-20 09:08:35,480: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=5 with params {}[2022-09-20 09:08:35,937: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=6 with params {}[2022-09-20 09:08:36,233: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=7 with params {}[2022-09-20 09:08:36,539: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=8 with params {}[2022-09-20 09:08:36,911: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=9 with params {}[2022-09-20 09:08:37,313: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=10 with params {}[2022-09-20 09:08:37,768: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=11 with params {}[2022-09-20 09:08:38,091: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=12 with params {}[2022-09-20 09:08:38,419: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=13 with params {}[2022-09-20 09:08:38,798: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=14 with params {}[2022-09-20 09:08:39,165: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=15 with params {}[2022-09-20 09:08:39,455: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=16 with params {}[2022-09-20 09:08:39,843: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search?page=17 with params {}[2022-09-20 09:08:40,047: WARNING/ForkPoolWorker-1] Unexpected HTTP status code 500 on https://try.gogs.io/api/v1/repos/search?page=17: b''[2022-09-20 09:08:40,390: ERROR/ForkPoolWorker-1] Task swh.lister.gogs.tasks.FullGogsRelister[740fe66e-1b02-43ce-a696-0f6f22272617] raised unexpected: HTTPError('500 Server Error: Internal Server Error for url: https://try.gogs.io/api/v1/repos/search?page=17')Traceback (most recent call last): File "/opt/swh/.local/lib/python3.10/site-packages/celery/app/trace.py", line 451, in trace_task R = retval = fun(*args, **kwargs) File "/opt/swh/.local/lib/python3.10/site-packages/sentry_sdk/integrations/celery.py", line 204, in _inner reraise(*exc_info) File "/opt/swh/.local/lib/python3.10/site-packages/sentry_sdk/_compat.py", line 54, in reraise raise value File "/opt/swh/.local/lib/python3.10/site-packages/sentry_sdk/integrations/celery.py", line 199, in _inner return f(*args, **kwargs) File "/opt/swh/.local/lib/python3.10/site-packages/swh/scheduler/task.py", line 61, in __call__ result = super().__call__(*args, **kwargs) File "/opt/swh/.local/lib/python3.10/site-packages/celery/app/trace.py", line 734, in __protected_call__ return self.run(*args, **kwargs) File "/opt/swh/.local/lib/python3.10/site-packages/swh/lister/gogs/tasks.py", line 23, in list_gogs_full return lister.run().dict() File "/opt/swh/.local/lib/python3.10/site-packages/swh/lister/pattern.py", line 127, in run for page in self.get_pages(): File "/opt/swh/.local/lib/python3.10/site-packages/swh/lister/gogs/lister.py", line 166, in get_pages response = self.page_request(next_link, {}) File "/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 324, in wrapped_f return self(f, *args, **kw) File "/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__ do = self.iter(retry_state=retry_state) File "/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 349, in iter return fut.result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 407, in __call__ result = fn(*args, **kwargs) File "/opt/swh/.local/lib/python3.10/site-packages/swh/lister/gogs/lister.py", line 136, in page_request response.raise_for_status() File "/opt/swh/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self)requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://try.gogs.io/api/v1/repos/search?page=17
So a fix is in order as suggested by @vlorentz and @anlambert described [1].
So should we not ignore the page and move on to the next one?
That'd make sense.
In this forge task, I also suggested temporarily changing the page size to 1 in order to skip the fatal repos.
Thanks for reminding me this, i don't recall (and it's there alright ;).
I can imagine that even with page of 1, you could end up with a 500 issue nonetheless so might as well ignore the 500 as you suggested above.
So we can have a chance to finalize the listing.
Nevertheless, they are less origins in the scheduler db, maybe some get listed more than once, need to check.
Confirmed after hacking on the lister code, some origins can be listed more than once but listing https://try.gogs.io should return 447 origins not 160.
The workaround from @KShivendu assumes the urls is already created ok... and then when
it hits an issue, it built the new link out of the one received from the response of the
first query... [1]
But on my side, the "stateless" lister starts from the problematic page (17), and then
plainly stops... [2].
Note that there is a misbehavior in terms here for that "stateless" typed lister. If it
is indeed a stateless lister, it should start from scratch each time it runs. If it has
an incremental behavior (it seems so from that described behavior) then it should be
probably typed differently.
[1]
| [2022-09-21 12:24:57,594: DEBUG/ForkPoolWorker-1] Fetching URL https://try.gogs.io/api/v1/repos/search with params {'limit': 50, 'page': 1}...| [2022-09-21 12:25:03,250: WARNING/ForkPoolWorker-1] Unexpected HTTP status code 500 on https://try.gogs.io/api/v1/repos/search?page=17: b'' <- problematic page kumar worked around
[2] Staging lister starts from this erratic page and stops.
But on my side, the "stateless" lister starts from the problematic page (17), and then
plainly stops... [2].
Note that there is a misbehavior in terms here for that "stateless" typed lister. If it
is indeed a stateless lister, it should start from scratch each time it runs. If it has
an incremental behavior (it seems so from that described behavior) then it should be
probably typed differently.
Bingo, it is a stateful lister... so nevermind my remark ^
I either remembered an old stateless implementation or i remember another implementation
altogether.
I'll unstuck the situation by resetting that state.
@KShivendu, you may want to deal with that edgecase though (please, create another
task/diff if you want to address it). If the first page of the saved state is not
working, next time it triggers, the current lister implementation won't list anything
¯_(ツ)_/¯.
[1]
10:41:16 swh-scheduler@db1:5432=> select * from listers where name='gogs';+--------------------------------------+------+---------------+-------------------------------+-------------------------------------------------------------------------------------------------------+-------------------------------+| id | name | instance_name | created | current_state | updated |+--------------------------------------+------+---------------+-------------------------------+-------------------------------------------------------------------------------------------------------+-------------------------------+| ec6c8408-42f9-47b8-b045-5a812d4a0eff | gogs | try.gogs.io | 2022-09-13 09:13:00.647726+00 | {"last_seen_repo_id": 7591, "last_seen_next_link": "https://try.gogs.io/api/v1/repos/search?page=17"} | 2022-09-20 09:08:40.081281+00 |+--------------------------------------+------+---------------+-------------------------------+-------------------------------------------------------------------------------------------------------+-------------------------------+(1 row)Time: 5.468 ms
I'll unstuck the situation by resetting that state.
State reset [1] and trigger back a listing [2] which resulted in the same amount of
origins than @anlambert [3].
[1]
11:00:17 swh-scheduler@db1:5432=> update listers set current_state='{}' where name='gogs';UPDATE 1Time: 162.686 ms
[2]
11:00:24 swh-scheduler@db1:5432=> update task set next_run=now(), status='next_run_not_scheduled' where type='list-gogs-full' and id=33419563;UPDATE 1Time: 168.393 ms