Skip to content

Federate prometheus instances through thanos

Thanos is the swiss-army knife for prometheus federation/HA/clustering.

It allows querying a global view of multiple, potentially redundant, prometheus data stores, by pushing data from prometheus instances to centralised object stores, then providing query frontends for each of these data stores.

Plan:

  • Install manual thanos services in mmca (temporary provenance server)
  • Push historical data from mmca to a thanos datastore bucket
  • Push historical data from pergamon to a thanos datastore bucket
  • infra/swh-sysadmin-provisioning!80: Provision thanos query dedicated node (+ inventory update)
  • D8092: Expose a thanos query service to read from those datastore
  • infra/puppet/puppet-swh-site!534: Expose thanos gateway service to access historical data - [ ] Expose thanos gateway on mmca (historical data access) -> will make it run on thanos node
  • infra/puppet/puppet-swh-site!534: Update thanos query to read from those gateways as well
  • Fix communication between thanos and pergamon nodes (firewall)
  • Fix communication between thanos and mmca nodes (certs)
  • infra/puppet/puppet-swh-site!532: Drop mmca's prometheus federation from puppet
  • mmca: drop history on Prometheus server (/var/lib/Prometheus/metrics2) [3]
  • mmca: Clean up historical data from bucket mmca-metrics-0 [3]
  • Switch grafana datasource from pergamon's prometheus to the thanos query service
  • Instantiate thanos sidecar service in staging cluster (then reference it to thanos node) - [ ] Instantiate prometheus/thanos services in staging environment no more need for it since #4540 (closed)
  • Instantiate prometheus/thanos services in archive-staging environment
  • Instantiate prometheus/thanos services in archive-production environment
  • Instantiate prometheus/thanos services in admin environment
  • Instantiate prometheus/thanos services in azure environment
  • Instantiate prometheus/thanos services in gitlab staging environment
  • Instantiate prometheus/thanos services in gitlab production environment
  • Instantiate prometheus/thanos services in rancher environment
  • Federate it through thanos (puppet run on thanos node should add their grpc entries)
  • Drop pergamon's prometheus
  • Document

Draft note can be found in the hedgedoc document [2].


Migrated from T4385 (view on Phabricator)

Edited by Vincent Sellier