Skip to content

Reindex old data on banco to put it into swh_worker indexes

Historical data on banco has been stored into generic logstash-${date} indexes. These indexes now contain deleted documents and are not compressed as much as they could, wasting precious storage space.

This is the proposed reindexation process:

1. Change index template to improve reindexation speed
------------------------------------------------------

curl -i -H'Content-Type: application/json' -XPUT http://192.168.101.58:9200/_template/template_swh_workers -d '
{
    "template" : "swh_workers-*",
    "settings" : {
	"number_of_shards" : 2,
	"number_of_replicas" : 0,
	"refresh_interval" : -1,
	"codec" : "best_compression"
    }
}'

2. Reindex
----------

time curl -i -H'Content-Type: application/json' -XPOST http://192.168.101.58:9200/_reindex -d '
{                                                       
        "source": { "index": "logstash-2017.03.08" }, 
        "dest":   { "index": "swh_workers-2017.03.08" }
}'

3. Add back replicas to index shards
------------------------------------

curl -i -H'Content-Type: application/json' -XPUT http://192.168.101.58:9200/swh_workers-2017.03.08/_settings -d '
{
    "index" : { "number_of_replicas" : 1 }
}'

4. Change index template back to sane defaults
----------------------------------------------

curl -i -H'Content-Type: application/json' -XPUT http://192.168.101.58:9200/_template/template_swh_workers -d '
{
    "template" : "swh_workers-*",
    "settings" : {
	"number_of_shards" : 2,
	"number_of_replicas" : 1,
	"refresh_interval" : "30s",
	"codec" : "best_compression"
    }
}'

Migrated from T1000 (view on Phabricator)