SystemExit: 1(17 additional frame(s) were not displayed)... File "http/client.py", line 1375, in getresponse response.begin() File "http/client.py", line 318, in begin version, status, reason = self._read_status() File "http/client.py", line 279, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "socket.py", line 705, in readinto return self._sock.recv_into(b) File "gunicorn/workers/base.py", line 204, in handle_abort sys.exit(1)
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Activity
Sort or filter
Newest first
Oldest first
Show all activity
Show comments only
Show history only
Vincent Sellierchanged title from SystemExit: 1 to Counter refresh history exit in error
changed title from SystemExit: 1 to Counter refresh history exit in error
It seems it's a common error when thanos does not manage to reply in time to the history query.
The cronjob retry the query each second in case of error which doesn't help and trigger a log of error in sentry.
The cron job will be updated to use the retry command to correctly ensure the rpc service is running and fail the job if an error occurs.
The retry will be automatically done one hour later in case of an error which is better for thanos
I guess this is why the history counters are broken in production webapp, there is a jump in history data between the 14 April 2021 and 24 August 2024 ...
Not sure it's related, if the historical data was not fetched from thanos, the graph should stop in 2021.
As it's end in 2024, at least some data is retrieved, it needs more investigations.
It's more probably a timeout of thanos-query that failed to retrieve the data stored in the azure bucket but the live data in the local prometheus is correctly retrieved, which explain the 2024 values.
Nice to have the full counters history back ! This made me notice a small issue in the histogram tooltip code, fixed in swh/devel/swh-web!1309 (merged).