elasticsearch cluster disk usage and maintenance

added Archive search System administration priority:Normal labels

Initially wrongly written in #3060 (closed).

On Tue 23 Feb 2021, warning on disk usage size popped up (80% used disk usage threshold hit) [2]

Connection to one of the cluster server node, a high number of docs deleted popped up:

root@search-esnode1:~# curl http://192.168.100.81:9200/_cat/indices\?v
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1  152755840    123305839    391.1gb        195.8gb

So a force merge was triggered to clean up those and claim some space.

curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/origin/_forcemerge

But that did not immediately worked.

Instead the load on all 3 servers node increased.

So the swh-search-journal-client@objects (on search1.internal.softwareheritage.org) got stopped so elasticsearch do have a chance to clean up whatever it needed to do.

It worked, the load subsided.

And the docs deleted number is actually decreasing (and disk space got freed). The initial goal was reached (free disk space ;)

root@search-esnode1:~# date; curl http://$ES_SERVER/_cat/indices\?v
Tue 23 Feb 2021 08:57:09 AM UTC
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1  152756605     40806220    273.1gb        137.4gb

It's still in progress.

Further discussion ensued on #sysadm about the force merge action:


09:39 <+olasd> Force merge should only be called against an index after you have
finished writing to it. Force merge can cause very large (>5GB) segments to be produced,
and if you continue to write to such an index then the automatic merge policy will never
consider these segments for future merges until they mostly consist of deleted
documents.
...
09:40 <+olasd> force merge should only be called against an index after you're
*completely done* writing to it
09:41 <+olasd> if you're going to update the documents ever again, the merged segments
won't be automatically cleaned, until most documents have been updated
09:41 <+olasd> because they're gigantic
...

So next action is actually to discontinue the usage of that index which is tainted by our use of the force_merge action.

I'll create a new one out of the new as soon as disk space usage allows it (more than 50% free disk space).

[1]

06:32 <swhbot`> icinga PROBLEM: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is WARNING: DISK WARNING - free space: /srv/elasticsearch/nodes 41310 MB (20% inode=99%);
06:46 <swhbot`> icinga RECOVERY: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is OK: DISK OK - free space: /srv/elasticsearch/nodes 41590 MB (21% inode=99%);
06:49 <swhbot`> icinga PROBLEM: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is WARNING: DISK WARNING - free space: /srv/elasticsearch/nodes 40615 MB (20% inode=99%);
06:53 <swhbot`> icinga RECOVERY: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is OK: DISK OK - free space: /srv/elasticsearch/nodes 41737 MB (21% inode=99%);
07:05 <swhbot`> icinga PROBLEM: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is WARNING: DISK WARNING - free space: /srv/elasticsearch/nodes 41305 MB (20% inode=99%);
08:06 <swhbot`> icinga RECOVERY: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is OK: DISK OK - free space: /srv/elasticsearch/nodes 41527 MB (21% inode=99%);
08:11 <swhbot`> icinga PROBLEM: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is WARNING: DISK WARNING - free space: /srv/elasticsearch/nodes 41135 MB (20% inode=99%);
08:12 <swhbot`> icinga PROBLEM: service disk /srv/elasticsearch/nodes on search-esnode2.internal.softwareheritage.org is WARNING: DISK WARNING - free space: /srv/elasticsearch/nodes 41403 MB (20% inode=99%);
08:28 <swhbot`> icinga RECOVERY: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is OK: DISK OK - free space: /srv/elasticsearch/nodes 41620 MB (21% inode=99%);
08:33 <swhbot`> icinga PROBLEM: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is WARNING: DISK WARNING - free space: /srv/elasticsearch/nodes 41248 MB (20% inode=99%);
08:44 <swhbot`> icinga RECOVERY: service disk /srv/elasticsearch/nodes on search-esnode1.internal.softwareheritage.org is OK: DISK OK - free space: /srv/elasticsearch/nodes 42165 MB (21% inode=99%);
08:48 <swhbot`> icinga RECOVERY: service disk /srv/elasticsearch/nodes on search-esnode2.internal.softwareheritage.org is OK: DISK OK - free space: /srv/elasticsearch/nodes 56738 MB (28% inode=99%);

changed the description

added state:wip label

Create new index out of the old

export ES_SERVER=192.168.100.81:9200

cat >reindex-origin.json <<EOF
{
  "source": {
    "index": "origin"
  },
  "dest": {
    "index": "origin-backup-20210223-1017"
  }
}
EOF

curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json

in-progress ^

puppet agent got disabled on search1.internal.softwareheritage.org so puppet does not start back the search-journal-client@objects (which writs to the index) [1]

Status:

root@search-esnode1:~# watch -n 10 "curl -s http://$ES_SERVER/_cat/nodes\?v; echo ; curl -s http://$ES_SERVER/_cat/indices\?v ; echo ; df -h | grep elastic"

Every 10.0s: curl -s http://192.168.100.81:9200/_cat/nodes\?v; echo ; curl -s http://192.168.100.81:9200/_cat/indices\?v ; echo ; df -h | grep elastic                                                 search-esnode1: Tue Feb 23 09:54:10 2021

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.100.81           54          84  25    3.89    4.82     4.77 dilmrt    -      search-esnode1
192.168.100.83           61          89  30    7.28    7.20     6.80 dilmrt    -      search-esnode3
192.168.100.82           51          83  12    4.23    3.89     3.45 dilmrt    *      search-esnode2

health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      yFaqPPCnRFCnc5AA6Ah8lw  90   1  152756759     41962187      261gb        130.7gb
green  open   origin-backup-20210223-1017 Aa4lYwOVRRezkDP1UL7lBw  90   1      82108            0      1.5gb        766.5mb

elasticsearch-data                   193G   89G  105G  47% /srv/elasticsearch/nodes

[1]

puppet agent --disable "#3067: elasticsearch maintenance, (clean up docs deleted, create new index)"

Copy just finished:

root@search-esnode1:~# curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json
{
  "took" : 91121031,
  "timed_out" : false,
  "total" : 152756759,
  "updated" : 0,
  "created" : 152756759,
  "deleted" : 0,
  "batches" : 152757,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

root@search-esnode1:~# curl -s http://$ES_SERVER/_cat/indices\?v ; echo ; df -h | grep elastic
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      yFaqPPCnRFCnc5AA6Ah8lw  90   1  152756759     41962187      261gb        130.7gb
green  open   origin-backup-20210223-1017 Aa4lYwOVRRezkDP1UL7lBw  90   1  152756759            0     92.9gb         46.5gb

elasticsearch-data                   193G  119G   75G  62% /srv/elasticsearch/nodes

On to actually swith to use the new index and drop the old one.

So turns out, the mapping initialization step was missing. So cleanup, rinse, repeat.. without forgetting the mapping initialization step this time...

Preparing the new index (with a better name [1]) appropriately (the settings were fine):

# close misconfigured index [2]
root@search-esnode1:~# curl -s -XPOST $ES_SERVER/$BAD_INDEX/_close

# Retrieve current mapping from the main index
root@search-esnode1:~# curl -s $ES_SERVER/origin/_mapping | jq . > /tmp/origin-backup-mappings.json

# Create new index origin-production with the good mapping
root@search-esnode1:~# curl -s -XPUT -H 'Content-Type: application/json' $ES_SERVER/origin-production -d @/tmp/origin-backup-mappings.json | jq .
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "origin-production"
}

# the right mapping is installed ok:
root@search-esnode1:~# curl -s $ES_SERVER/origin-production/_mapping | jq .
{
  "origin-production": {
    "mappings": {
      "date_detection": false,
      "properties": {
        "has_visits": {
          "type": "boolean"
        },
        "intrinsic_metadata": {
          "type": "nested",
          "properties": {
            "@context": {
              "type": "keyword"
            }
          }
        },
        "sha1": {
          "type": "keyword"
        },
        "url": {
          "type": "text",
          "fields": {
            "as_you_type": {
              "type": "search_as_you_type",
              "doc_values": false,
              "analyzer": "simple",
              "max_shingle_size": 3
            }
          },
          "analyzer": "simple"
        },
        "visit_types": {
          "type": "keyword"
        }
      }
    }
  }
}

And we start again the copy:

cat >reindex-origin.json <<EOF
{
  "source": {
    "index": "origin"
  },
  "dest": {
    "index": "origin-production"
  }
}
EOF

curl -XPOST -H "Content-Type: application/json" ${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json

And now we wait, following a bit stats:

health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin            yFaqPPCnRFCnc5AA6Ah8lw  90   1  152756759     41962187      261gb        130.7gb
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1    1021300            0      1.1gb        917.9mb

elasticsearch-data                   193G   88G  106G  46% /srv/elasticsearch/nodes

[1] I'll install an alias "origin" on the new "origin-production" index. Keeping the name "origin" as the index name is currently hard-coded in the swh-search code.
[2] We don't have that much space so cleaning up the wrong copy:

ardumont@search-esnode1:~% curl -s -X DELETE  $ES_SERVER/origin-backup-20210223-1017
{"acknowledged":true}%

Status, still in progress:

Every 10.0s: curl -s http://192.168.100.81:9200/_cat/nodes\?v; echo ; curl -s http://192.168.100.81:9200/_cat/indices\?v ; echo ; df -h | grep elastic                                                 search-esnode1: Wed Feb 24 16:14:58 2021

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.100.81           71          89  50    5.80    5.57     5.17 dilmrt    -      search-esnode1
192.168.100.83           74          83  63    7.41    7.89     7.74 dilmrt    -      search-esnode3
192.168.100.82           74          90  22    3.13    3.74     4.06 dilmrt    *      search-esnode2

health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin            yFaqPPCnRFCnc5AA6Ah8lw  90   1  152756759     41962187      261gb        130.7gb
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1   32715334            0     58.1gb         27.6gb

elasticsearch-data                   193G  111G   83G  58% /srv/elasticsearch/nodes

Finally finished:

root@search-esnode1:~# curl -XPOST -H "Content-Type: application/json" ${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json
{
  "took" : 115296461,
  "timed_out" : false,
  "total" : 152756759,
  "updated" : 0,
  "created" : 152756759,
  "deleted" : 0,
  "batches" : 152757,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

And the disk size did not explode:

$ curl -s http://192.168.100.81:9200/_cat/indices\?v ; echo ; df -h | grep elastic

health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin            yFaqPPCnRFCnc5AA6Ah8lw  90   1  152756759     41962187      261gb        130.7gb
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  152756759            0      217gb        108.6gb

elasticsearch-data                   193G  160G   34G  83% /srv/elasticsearch/nodes

Next is installing an alias "origin" on the "origin-production" after dropping the old "origin" index (but tomorrow).

Install alias "origin" on "origin-production" index:

Expectedly, we can't add an alias named the same as an existing index:

ardumont@search-esnode1:~% curl -s -XPOST $ES_SERVER/origin-production/_alias/origin
{"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [origin], an index exists with the same name as the alias","index_uuid":"yFaqPPCnRFCnc5AA6Ah8lw","index":"origin"}],"type":"invalid_alias_name_exce

So first close and delete the old index to reclaim space:

# Close the index
ardumont@search-esnode1:~% curl -s -XPOST $ES_SERVER/origin/_close
{"acknowledged":true,"shards_acknowledged":true,"indices":{"origin":{"closed":true}}}%
ardumont@search-esnode1:~% curl -s -XDELETE $ES_SERVER/origin
{"acknowledged":true}%

Install the new alias:

ardumont@search-esnode1:~% curl -s -XPOST $ES_SERVER/origin-production/_alias/origin
{"acknowledged":true}%
# check on alias
ardumont@search-esnode1:~% curl -s -XGET $ES_SERVER/origin/_alias | jq .
{
  "origin-production": {
    "aliases": {
      "origin": {}
    }
  }
}

Status:

cluster health is fine
disk space reclaimed
search is still ok

ardumont@search-esnode1:~% curl -s $ES_SERVER/_cat/indices\?v
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  152756759            0      217gb        108.6gb

ardumont@search-esnode1:~% curl -s -XPOST $ES_SERVER/origin/_search | jq . > /tmp/status-search.json
{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 90,
    "successful": 90,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "714a5653925512789ddc516148f4048cfc3c6b98",
        "_score": 1,
        "_source": {
          "url": "https://github.com/usarvesh1994/node_weather",
          "sha1": "714a5653925512789ddc516148f4048cfc3c6b98",
          "has_visits": true
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "6634fbdeff872f7fa4d806be1ddb0bf43e494cea",
        "_score": 1,
        "_source": {
          "url": "https://github.com/sschimper/Wrecking_Ball",
          "sha1": "6634fbdeff872f7fa4d806be1ddb0bf43e494cea"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "d8ea246217f38442c78ab920d64830fb108b3456",
        "_score": 1,
        "_source": {
          "url": "https://github.com/malefique/alex",
          "sha1": "d8ea246217f38442c78ab920d64830fb108b3456"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "4da7307b060260e63a4ace3019d0ded5386d521a",
        "_score": 1,
        "_source": {
          "url": "https://github.com/kk50576/hello100",
          "sha1": "4da7307b060260e63a4ace3019d0ded5386d521a"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "833fbcf594895ad6b60007f45db5f1c005aca41f",
        "_score": 1,
        "_source": {
          "url": "https://github.com/andrespardo/proyecto",
          "sha1": "833fbcf594895ad6b60007f45db5f1c005aca41f"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "a470347d9632f966548b5ba178dc5de6ee55342a",
        "_score": 1,
        "_source": {
          "url": "https://github.com/rajkiranchiluvuri/Javascript",
          "sha1": "a470347d9632f966548b5ba178dc5de6ee55342a"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "f49e99c83ff2957e605955cf45eba85e8d2f6672",
        "_score": 1,
        "_source": {
          "url": "https://github.com/hikmetguler/H-G",
          "sha1": "f49e99c83ff2957e605955cf45eba85e8d2f6672"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "d0d2aaff25afb603a4843afdc9f854fe1c1b1010",
        "_score": 1,
        "_source": {
          "url": "https://github.com/jburton91/solid-octo-enigma",
          "sha1": "d0d2aaff25afb603a4843afdc9f854fe1c1b1010"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "37ed1702cd0c395974105c6bfe6bb92d4c2bb5c4",
        "_score": 1,
        "_source": {
          "url": "https://github.com/trivialmatters/crawler-site-screenshot-grabber",
          "sha1": "37ed1702cd0c395974105c6bfe6bb92d4c2bb5c4"
        }
      },
      {
        "_index": "origin-production",
        "_type": "_doc",
        "_id": "a8e6431a4c3f0b83f6b1dcd7c3c70c29bc305bec",
        "_score": 1,
        "_source": {
          "url": "https://github.com/kaganozturk/test",
          "sha1": "a8e6431a4c3f0b83f6b1dcd7c3c70c29bc305bec"
        }
      }
    ]
  }
}

We can start back the swh-search-journal-client@object service.

We can start back the swh-search-journal-client@object service.

Done:

root@search1:~# systemctl start swh-search-journal-client@objects.service
root@search1:~# systemctl status swh-search-journal-client@objects.service
● swh-search-journal-client@objects.service - Software Heritage Search Journal Client objects
     Loaded: loaded (/etc/systemd/system/swh-search-journal-client@.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2021-02-26 09:12:44 UTC; 44s ago
   Main PID: 1594719 (swh)
      Tasks: 8 (limit: 19212)
     Memory: 540.2M
        CPU: 6.207s
     CGroup: /system.slice/system-swh\x2dsearch\x2djournal\x2dclient.slice/swh-search-journal-client@objects.service
             └─1594719 /usr/bin/python3 /usr/bin/swh search --config-file /etc/softwareheritage/search/journal_client_objects.yml journal-client objects

Feb 26 09:13:26 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode2.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.438s]
Feb 26 09:13:27 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode1.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.205s]
Feb 26 09:13:27 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode3.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.158s]
Feb 26 09:13:27 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode2.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.139s]
Feb 26 09:13:27 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode1.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.288s]
Feb 26 09:13:27 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode3.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.155s]
Feb 26 09:13:28 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode2.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.136s]
Feb 26 09:13:28 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode1.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.154s]
Feb 26 09:13:28 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode3.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.168s]
Feb 26 09:13:28 search1 swh[1594719]: INFO:elasticsearch:POST http://search-esnode2.internal.softwareheritage.org:9200/origin/_bulk [status:200 request:0.356s]

And puppet reenabled:

root@search1:~# puppet agent --enable
root@search1:~# puppet agent --test
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for search1.internal.softwareheritage.org
Info: Applying configuration version '1614330842'
Notice: Applied catalog in 8.18 seconds

Expectedly with the previous action, number of documents started growing again.

green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  152795694       297907    217.6gb        109.2gb

So finally good with this. Closing.

removed state:wip label

closed

The backfill / reindexation looks aggressive for the cluster and the search. There is a lot of timeouts on the webapp's search

                                         File "/usr/lib/python3/dist-packages/elasticsearch/connection/http_urllib3.py", line 249, in perform_request
                                           raise ConnectionTimeout("TIMEOUT", str(e), e)
                                       elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='search-esnode3.internal.softwareheritage.org', port=9200): Read timed out. (read timeout=10))

Changing the index configuration for async I/O and reduce the refresh interval should help:

% cat >/tmp/config.json <<EOF
{
  "index" : {
"translog.sync_interval" : "60s",
"translog.durability": "async",
"refresh_interval": "60s"
  }
}
EOF
% export NEW_INDEX=origin-production
% curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${NEW_INDEX}/_settings -d @/tmp/config.json
{"acknowledged":true}

It seems it helps as the search queries are responding in less than a second now:

Mar 01 08:41:23 moma python3[3491946]: 2021-03-01 08:41:23 [3491946] elasticsearch:INFO GET http://search-esnode3.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:0.795s]
Mar 01 08:46:30 moma python3[3491945]: 2021-03-01 08:46:30 [3491945] elasticsearch:INFO GET http://search-esnode3.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:0.738s]
Mar 01 08:49:26 moma python3[3491944]: 2021-03-01 08:49:26 [3491944] elasticsearch:INFO GET http://search-esnode2.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:0.565s]

mentioned in issue #3060 (closed)

elasticsearch cluster disk usage and maintenance

Designs

Child items ...

Activity