Instabilities after the deployment of swh v309
After the latest deployment various issues happened in the production environment:
- Postgresql connection pool full
- Problem to interact with the azure object storage
#5337 (closed): read-only instance fails to start on writable check failing
No timeline items have been added yet.
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Vincent Sellier added activity::Deployment incident labels
added activity::Deployment incident labels
- Vincent Sellier assigned to @vsellier
assigned to @vsellier
- Vincent Sellier marked this incident as related to #5329 (closed)
marked this incident as related to #5329 (closed)
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Vincent Sellier edited the event time/date on incident timeline event
edited the event time/date on incident timeline event
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Vincent Sellier edited the event time/date on incident timeline event
edited the event time/date on incident timeline event
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Vincent Sellier edited the text on incident timeline event
edited the text on incident timeline event
- Vincent Sellier edited the text on incident timeline event
edited the text on incident timeline event
- Vincent Sellier edited the text on incident timeline event
edited the text on incident timeline event
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Vincent Sellier edited the event time/date on incident timeline event
edited the event time/date on incident timeline event
- Vincent Sellier added an incident timeline event
added an incident timeline event
- Author Owner
The fix of the azure access issue: swh/devel/swh-objstorage!194 (merged)
Looks like a regression of swh/devel/swh-objstorage@cd672034
Unfortunately, the exception was not recorded in ELK due to an opentelemety parsing issue #5335
- Vincent Sellier edited the text on incident timeline event
edited the text on incident timeline event
- Author Owner
Regarding the connection number issue, there is no significant connection peak on postgresql in the interval
- ~ 175 connections to 5433
- ~ 135 connections to 5434
- ~ 20 connections to 5435
It doesn't reflect the pgbouncer activity but there is no other stats available.
To have some metrics for pgbouncer, the dedicated prometheus exporter should be installed on albertina and massmoca (and staging): #5336
Edited by Vincent Sellier - Vincent Sellier edited the text on incident timeline event
edited the text on incident timeline event
- Maintainer
Sentry Issue: SWH-LOADER-SVN-J2
- Maintainer
Sentry Issue: SWH-LOADER-SVN-J3
- Maintainer
Sentry Issue: SWH-STORAGE-2X7V
- Maintainer
Sentry Issue: SWH-SCHEDULER-CX
- Maintainer
Sentry Issue: SWH-WEBAPP-5X5
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont changed title from Instabilities after the deployment of swh v310 to Instabilities after the deployment of swh v309
changed title from Instabilities after the deployment of swh v310 to Instabilities after the deployment of swh v309
- Antoine R. Dumont mentioned in issue #5329 (closed)
mentioned in issue #5329 (closed)
- Antoine R. Dumont mentioned in commit swh/devel/swh-objstorage@12bc3ba8
mentioned in commit swh/devel/swh-objstorage@12bc3ba8
- Antoine R. Dumont mentioned in merge request swh/devel/swh-objstorage!195 (closed)
mentioned in merge request swh/devel/swh-objstorage!195 (closed)
- Antoine R. Dumont added environment: production environment: staging labels
added environment: production environment: staging labels
- Antoine R. Dumont changed the description
changed the description
- Owner
Connection limit on pgbouncer: swh/infra/puppet/puppet-swh-site@fa18c480
swh-objstorage
check_config
issues: swh/devel/swh-objstorage!194 (merged) released in swh.objstorage 3.1.2 and deployed - Nicolas Dandrimont closed
closed
- Nicolas Dandrimont changed the incident status to Resolved by closing the incident
changed the incident status to Resolved by closing the incident