Skip to content
Snippets Groups Projects
Commit 9bda2dbb authored by vlorentz's avatar vlorentz Committed by Phabricator Migration user
Browse files

Document statsd metrics and link to dashboards

parent 8d046507
No related branches found
No related tags found
No related merge requests found
......@@ -237,3 +237,4 @@ Indices and tables
api-reference
archive-changelog
journal
statsd
.. _swh_statsd_metrics:
Statsd metrics and Grafana dashboards
=====================================
This page lists all statsd metrics reported by Software Heritage's components,
and other metrics commonly used to monitor them
.. _swh_statsd_metrics_archive:
Archive
-------
* ``sql_swh_archive_object_count``
* ``sql_swh_scheduler_delay``
* ``swh_archive_object_total``
.. _swh_statsd_metrics_journal:
Journal
-------
* ``swh_journal_client_handle_message_total``
* ``swh_journal_client_status``
Client progress and status is monitored using the `Kafka estimated time to completion
<https://grafana.softwareheritage.org/d/Jayj4QsGk/kafka-estimated-time-to-completion>`
dashboard for a loader-specific view, and `Kafka consumer lags
<https://grafana.softwareheritage.org/d/KvQqUhsWz/kafka-consumers-lag>` to show all
consumers at once.
.. _swh_statsd_metrics_indexers:
Indexers
--------
See :ref:`swh_statsd_metrics_rpc`.
.. _swh_statsd_metrics_loaders:
Loaders
-------
Filterered objects, ie. objects received by the loader that the archive
already has (currently only reported by the Git loader):
* ``swh_loader_filtered_objects_percent_bucket``
* ``swh_loader_filtered_objects_percent_count``
* ``swh_loader_filtered_objects_percent_sum``
* ``swh_loader_filtered_objects_total_count``
* ``swh_loader_filtered_objects_total_sum``
Git references which are not loaded:
* ``swh_loader_git_ignored_refs_percent_bucket``
* ``swh_loader_git_ignored_refs_percent_count``
* ``swh_loader_git_ignored_refs_percent_sum``
* ``swh_loader_git_known_refs_percent_bucket``
* ``swh_loader_git_known_refs_percent_count``
* ``swh_loader_git_known_refs_percent_sum``
* ``swh_loader_git_total``
Metadata loading:
* ``swh_loader_metadata_fetchers_count`` and ``swh_loader_metadata_fetchers_sum``: the ratio is the average number of fetchers used by visit
* ``swh_loader_metadata_objects_count``: total number of metadata objects loaded
* ``swh_loader_metadata_objects_sum``
* ``swh_loader_metadata_parent_origins_count`` and ``swh_loader_metadata_parent_origins_sum``: the ratio is the average number of origins this origin is a fork of
Performance (all labeled with the name of an operation; and for the git loader,
by whether they are incremental):
* ``swh_loader_operation_duration_seconds_bucket``
* ``swh_loader_operation_duration_seconds_count``
* ``swh_loader_operation_duration_seconds_error_count``
* ``swh_loader_operation_duration_seconds_sum``
Loader status is monitored through the `Ingestion status`_ and `Loader metrics`_
dashboards, which are focused respectively on loaded objects and loaders themselves.
.. _Ingestion status: https://grafana.softwareheritage.org/d/Cgi8dR8Wz/ingestion-status
.. _Loader metrics: https://grafana.softwareheritage.org/d/FqGC4zu7z/vlorentz-loader-metrics
.. _swh_statsd_metrics_objstorage:
Object storage
--------------
In addition to :ref:`swh_statsd_metrics_rpc`,
* ``swh_objstorage_in_bytes_total``
* ``swh_objstorage_out_bytes_total``
.. _swh_statsd_metrics_provenance:
Provenance
----------
* ``swh_provenance_archive_direct_duration_seconds_bucket``
* ``swh_provenance_archive_direct_duration_seconds_count``
* ``swh_provenance_archive_direct_duration_seconds_error_count``
* ``swh_provenance_archive_direct_duration_seconds_sum``
* ``swh_provenance_archive_graph_duration_seconds_bucket``
* ``swh_provenance_archive_graph_duration_seconds_count``
* ``swh_provenance_archive_graph_duration_seconds_sum``
* ``swh_provenance_archive_multiplexed_duration_seconds_bucket``
* ``swh_provenance_archive_multiplexed_duration_seconds_count``
* ``swh_provenance_archive_multiplexed_duration_seconds_error_count``
* ``swh_provenance_archive_multiplexed_duration_seconds_sum``
* ``swh_provenance_archive_multiplexed_per_backend_count``
* ``swh_provenance_backend_duration_seconds_bucket``
* ``swh_provenance_backend_duration_seconds_count``
* ``swh_provenance_backend_duration_seconds_error_count``
* ``swh_provenance_backend_duration_seconds_sum``
* ``swh_provenance_backend_operations_total``
* ``swh_provenance_graph_duration_seconds_bucket``
* ``swh_provenance_graph_duration_seconds_count``
* ``swh_provenance_graph_duration_seconds_error_count``
* ``swh_provenance_graph_duration_seconds_sum``
* ``swh_provenance_origin_revision_layer_duration_seconds_bucket``
* ``swh_provenance_origin_revision_layer_duration_seconds_count``
* ``swh_provenance_origin_revision_layer_duration_seconds_error_count``
* ``swh_provenance_origin_revision_layer_duration_seconds_sum``
* ``swh_provenance_storage_postgresql_duration_seconds_bucket``
* ``swh_provenance_storage_postgresql_duration_seconds_count``
* ``swh_provenance_storage_postgresql_duration_seconds_error_count``
* ``swh_provenance_storage_postgresql_duration_seconds_sum``
* ``swh_provenance_storage_rabbitmq_duration_seconds_bucket``
* ``swh_provenance_storage_rabbitmq_duration_seconds_count``
* ``swh_provenance_storage_rabbitmq_duration_seconds_error_count``
* ``swh_provenance_storage_rabbitmq_duration_seconds_sum``
`Index of Provenance dashboards
<https://grafana.softwareheritage.org/dashboards/f/eKOFn6y7k/provenance>`_
.. _swh_statsd_metrics_replayers:
Content and graph replayers
---------------------------
* ``swh_content_replayer_bytes``
* ``swh_content_replayer_duration_seconds_bucket``
* ``swh_content_replayer_duration_seconds_count``
* ``swh_content_replayer_duration_seconds_error_count``
* ``swh_content_replayer_duration_seconds_sum``
* ``swh_content_replayer_operations_total``
* ``swh_content_replayer_retries_total``
* ``swh_graph_replayer_duration_seconds_bucket``
* ``swh_graph_replayer_duration_seconds_count``
* ``swh_graph_replayer_duration_seconds_sum``
* ``swh_graph_replayer_operations_total``
Dashboards:
* `Cassandra <https://grafana.softwareheritage.org/d/HW1-UgO4k/cassandra-replayers>`__
* `S3 <https://grafana.softwareheritage.org/d/d3l2oqXWz/s3-object-copy>`__
.. _swh_statsd_metrics_rpc:
RPC servers
-----------
``indexer_storage``, ``objstorage``, ``storage``, ``search``
each report this set of metrics:
* ``swh_<NAME>_request_duration_seconds_bucket``
* ``swh_<NAME>_request_duration_seconds_count``
* ``swh_<NAME>_request_duration_seconds_error_count``
* ``swh_<NAME>_request_duration_seconds_sum``
``indexer_storage``, and ``search`` also have:
* ``swh_<NAME>_operations_total``
.. _swh_statsd_metrics_scheduler:
Scheduler
---------
* ``swh_scheduler_listener_handled_event_total``
* ``swh_scheduler_origins_enabled``
* ``swh_scheduler_origins_known``
* ``swh_scheduler_origins_last_update``
* ``swh_scheduler_origins_never_visited``
* ``swh_scheduler_origins_with_pending_changes``
* ``swh_scheduler_runner_scheduled_task_total``
* ``swh_task_called_count``
* ``swh_task_duration_seconds_bucket``
* ``swh_task_duration_seconds_count``
* ``swh_task_duration_seconds_error_count``
* ``swh_task_duration_seconds_sum``
* ``swh_task_end_ts``
* ``swh_task_failure_count``
* ``swh_task_start_ts``
* ``swh_task_success_count``
.. _swh_statsd_metrics_search:
Search
------
See :ref:`swh_statsd_metrics_rpc`.
.. _swh_statsd_metrics_scrubber:
Scrubber
--------
Performance:
* ``swh_scrubber_batch_duration_seconds_bucket``
* ``swh_scrubber_batch_duration_seconds_count``
* ``swh_scrubber_batch_duration_seconds_error_count``
* ``swh_scrubber_batch_duration_seconds_sum``
* ``swh_scrubber_objects_hashed_total``
Corruptions found:
* ``swh_scrubber_hash_mismatch_total``
* ``swh_scrubber_missing_object_total``
.. _swh_statsd_metrics_storage:
Storage
-------
In addition to :ref:`swh_statsd_metrics_rpc`,
* ``swh_storage_operations_bytes_total``, which reports the total number of content bytes
going through the RPC server
.. _swh_statsd_metrics_webapp:
Webapp
------
* ``swh_web_accepted_save_requests``
* ``swh_web_save_requests_delay_seconds``
* ``swh_web_submitted_save_requests``
* ``swh_web_submitted_save_requests_from_webhooks``
Dashboard: `Save Code Now
<https://grafana.softwareheritage.org/d/WXRVVc_Mz/save-code-now>`_
.. _swh_statsd_metrics_misc:
Other metrics
-------------
Performance of end-to-end tests:
* ``swh_e2e_duration_seconds``
* ``swh_e2e_status``
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment