Skip to content
Snippets Groups Projects
Verified Commit ac247f08 authored by Antoine R. Dumont's avatar Antoine R. Dumont
Browse files

origin_visits: Reuse cache entries if present

This clarifies the current code and optimizes the re-fetching of all origin visits /
origin visit statuses when new visits happen. It's an optimization providing previous
visits are in the cache already.

The second part serves also as a workaround of some problematic origins (the ones with a
high number of visits). It's only a workaround though because when voiding the
cache (e.g. restart the cache service), the initial error (e.g. crash 500 or 502, ...)
will still happen.

Another change is on its way to try and fix that specific problem.

Related to T3905
parent e72b1e7a
No related branches found
No related tags found
1 merge request!930origin_visits: Reuse cache entries if present
......@@ -42,40 +42,42 @@ def get_origin_visits(origin_info: OriginInfo) -> List[OriginVisitInfo]:
cache_entry_id = "origin_visits_%s" % origin_url
cache_entry = cache.get(cache_entry_id)
last_visit = 0
origin_visits = []
new_visits = []
per_page = archive.MAX_LIMIT
if cache_entry:
origin_visits = cache_entry
last_visit = cache_entry[-1]["visit"]
new_visits = list(
archive.lookup_origin_visits(origin_url, last_visit=last_visit)
archive.lookup_origin_visits(
origin_url, last_visit=last_visit, per_page=per_page
)
)
last_visit += len(new_visits)
if not new_visits:
last_snp = archive.lookup_latest_origin_snapshot(origin_url)
if not last_snp or last_snp["id"] == cache_entry[-1]["snapshot"]:
return cache_entry
origin_visits = []
per_page = archive.MAX_LIMIT
last_visit = None
# get new visits that we did not retrieve yet
while 1:
visits = list(
archive.lookup_origin_visits(
origin_url, last_visit=last_visit, per_page=per_page
)
)
origin_visits += visits
new_visits += visits
if len(visits) < per_page:
break
else:
if not last_visit:
last_visit = per_page
else:
last_visit += per_page
last_visit += per_page
def _visit_sort_key(visit):
ts = parse_iso8601_date_to_utc(visit["date"]).timestamp()
return ts + (float(visit["visit"]) / 10e3)
origin_visits = sorted(origin_visits, key=lambda v: _visit_sort_key(v))
# cache entry is already sorted with oldest visits
origin_visits += sorted(new_visits, key=lambda v: _visit_sort_key(v))
cache.set(cache_entry_id, origin_visits)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment