Skip to content

search: Add count_visit_types to interface

It enables to return the origin counts per visit type.

It also enables to get all available visit types dynamically in other components like swh-web.

The underlying elasticsearch query has been tested on production cluster and it is pretty efficient.

(swh) ✔ ~/swh/swh-environment/swh-search [count-visit-types L|⚑ 3] 
18:27 $ ssh -L 9200:192.168.100.86:9200 search-esnode4.internal.softwareheritage.org
Linux search-esnode4 5.10.0-0.bpo.5-amd64 #1 SMP Debian 5.10.24-1~bpo10+1 (2021-03-29) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Aug 25 16:26:42 2021 from 192.168.101.15
anlambert@search-esnode4:~$ 
anlambert@carnavalet:~/tmp$ time curl -X POST http://localhost:9200/origin-production/_search?pretty -H 'Content-Type: application/json' -d '
{
    "aggs" : {
      "not_blocklisted" : {
        "filter": {
          "bool": {
            "must_not": [
                {"term": {"blocklisted": true}}
            ]
        }
        },
        "aggs": {
          "visit_types": {
            "terms" : { "field" : "visit_types", "size": 1000 }
          }
        }
      }
    },
    "size" : 0
}'
{
  "took" : 940,
  "timed_out" : false,
  "_shards" : {
    "total" : 90,
    "successful" : 90,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "not_blocklisted" : {
      "doc_count" : 162289904,
      "visit_types" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "git",
            "doc_count" : 154006431
          },
          {
            "key" : "npm",
            "doc_count" : 1660597
          },
          {
            "key" : "svn",
            "doc_count" : 679040
          },
          {
            "key" : "hg",
            "doc_count" : 415270
          },
          {
            "key" : "pypi",
            "doc_count" : 398714
          },
          {
            "key" : "deb",
            "doc_count" : 72303
          },
          {
            "key" : "cran",
            "doc_count" : 18019
          },
          {
            "key" : "ftp",
            "doc_count" : 1205
          },
          {
            "key" : "deposit",
            "doc_count" : 1114
          },
          {
            "key" : "tar",
            "doc_count" : 390
          },
          {
            "key" : "nixguix",
            "doc_count" : 2
          }
        ]
      }
    }
  }
}

real    0m1,168s
user    0m0,012s
sys     0m0,005s

Related to #3441 (closed).


Migrated from D6137 (view on Phabricator)

Merge request reports