Skip to content
Snippets Groups Projects
Commit 35871896 authored by Antoine Lambert's avatar Antoine Lambert
Browse files

pattern: Improve handling of max_origins_per_page parameter

Instead of fully consuming the get_origins_from_page generator into
a list and truncate it, prefer to consume the generator origin per
origin and abort the process when the max number of origin per page
is reached.

Indeed some non trivial listers like the cgit one can perform costly
processing, HTTP request for instance, for each origin in a page.
So better not consuming the full generator in a row to avoid such
side effects.
parent 45bbc29a
No related branches found
No related tags found
No related merge requests found
......@@ -182,17 +182,20 @@ class Lister(Generic[StateType, PageType]):
try:
for page in self.get_pages():
full_stats.pages += 1
origins = list(self.get_origins_from_page(page))
if (
self.max_origins_per_page
and len(origins) > self.max_origins_per_page
):
logger.info(
"Max origins per page set, truncated %s page results down to %s",
len(origins),
self.max_origins_per_page,
)
origins = origins[: self.max_origins_per_page]
origins = []
for origin in self.get_origins_from_page(page):
origins.append(origin)
if (
self.max_origins_per_page
and len(origins) == self.max_origins_per_page
):
logger.info(
"Max origins per page set to %s and reached, "
"aborting page processing",
self.max_origins_per_page,
)
break
if not self.enable_origins:
logger.info(
"Disabling origins before sending them to the scheduler"
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment