search: Add count_visit_types to interface
It enables to return the origin counts per visit type.
It also enables to get all available visit types dynamically in other components like swh-web.
The underlying elasticsearch query has been tested on production cluster and it is pretty efficient.
(swh) ✔ ~/swh/swh-environment/swh-search [count-visit-types L|⚑ 3]
18:27 $ ssh -L 9200:192.168.100.86:9200 search-esnode4.internal.softwareheritage.org
Linux search-esnode4 5.10.0-0.bpo.5-amd64 #1 SMP Debian 5.10.24-1~bpo10+1 (2021-03-29) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Aug 25 16:26:42 2021 from 192.168.101.15
anlambert@search-esnode4:~$
anlambert@carnavalet:~/tmp$ time curl -X POST http://localhost:9200/origin-production/_search?pretty -H 'Content-Type: application/json' -d '
{
"aggs" : {
"not_blocklisted" : {
"filter": {
"bool": {
"must_not": [
{"term": {"blocklisted": true}}
]
}
},
"aggs": {
"visit_types": {
"terms" : { "field" : "visit_types", "size": 1000 }
}
}
}
},
"size" : 0
}'
{
"took" : 940,
"timed_out" : false,
"_shards" : {
"total" : 90,
"successful" : 90,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"not_blocklisted" : {
"doc_count" : 162289904,
"visit_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "git",
"doc_count" : 154006431
},
{
"key" : "npm",
"doc_count" : 1660597
},
{
"key" : "svn",
"doc_count" : 679040
},
{
"key" : "hg",
"doc_count" : 415270
},
{
"key" : "pypi",
"doc_count" : 398714
},
{
"key" : "deb",
"doc_count" : 72303
},
{
"key" : "cran",
"doc_count" : 18019
},
{
"key" : "ftp",
"doc_count" : 1205
},
{
"key" : "deposit",
"doc_count" : 1114
},
{
"key" : "tar",
"doc_count" : 390
},
{
"key" : "nixguix",
"doc_count" : 2
}
]
}
}
}
}
real 0m1,168s
user 0m0,012s
sys 0m0,005s
Related to #3441 (closed).
Migrated from D6137 (view on Phabricator)