Skip to content
Snippets Groups Projects
Verified Commit 920ed0d5 authored by Antoine R. Dumont's avatar Antoine R. Dumont
Browse files

lister.pattern: Restore flushing origin batch in the scheduler

Prior to this commit, the newly introduced check on url validity was consuming the
stream of origins. In effect, this would no longer write origin records regularly.

For all listers, that would translate to flush origins only at the end of the listing
which could take a while for some (e.g. packagist lister has been running for more than
12h currently without writing anything in the scheduler).
parent 56b4fcc7
No related branches found
No related tags found
1 merge request!492lister.pattern: Restore flushing origin batch in the scheduler
Pipeline #3690 passed
......@@ -337,21 +337,23 @@ class Lister(Generic[StateType, PageType]):
pass
def send_origins(self, origins: Iterable[model.ListedOrigin]) -> List[str]:
"""Record a list of :class:`model.ListedOrigin` in the scheduler.
"""Record the stream of valid :class:`model.ListedOrigin` in the scheduler.
This will filter out invalid urls prior to record origins to the scheduler.
Returns:
the list of origin URLs recorded in scheduler database
"""
valid_origins = []
for origin in origins:
if is_valid_origin_url(origin.url):
valid_origins.append(origin)
else:
logger.warning("Skipping invalid origin: %s", origin.url)
recorded_origins = []
for batch_origins in grouper(valid_origins, n=1000):
ret = self.scheduler.record_listed_origins(batch_origins)
for origins in grouper(origins, n=1000):
valid_origins = []
for origin in origins:
if is_valid_origin_url(origin.url):
valid_origins.append(origin)
else:
logger.warning("Skipping invalid origin: %s", origin.url)
ret = self.scheduler.record_listed_origins(valid_origins)
recorded_origins += [origin.url for origin in ret]
return recorded_origins
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment