Make the elasticsearch logging cluster actually a cluster

added Elasticsearch consolidation (W24/2018) priority:Normal labels

marked this issue as related to #793 (closed)

marked this issue as related to swh/infra/sysadm-environment#945 (closed)

marked this issue as related to #977 (closed)

marked this issue as related to swh/infra/sysadm-environment#983 (closed)

marked this issue as related to swh/infra/sysadm-environment#990 (closed)

marked this issue as related to swh/infra/sysadm-environment#1000 (closed)

Detailled cluster creation proposal

1. Install new nodes

The machines have 4x 2TB HDDs
Partitions:
- RAID 1	/			40GB
- RAID 0	/srv/elasticsearch	all the rest

Using a RAID 0 volume will improve I/O performance
We don't need redundancy on /srv/elasticsearch since
Elasticsearch itself will provide it

Name the hosts with easy to remember names in the DNS like "esnode0.internal.softwareheritage.org"

2. Install Elasticsearch on the new nodes

[This stage could be puppetized]

The new hosts have 32GB of RAM.

Change etc/elasticsearch/jvmoptions :
	# Use a maximum of 50% of available memory for the jvm heap
	# Keep it <32GB in order to benefit from jvm pointer compression
	# Around 50% of memory should be kept free for disk caching purposes
	-Xms15g
	-Xmx15g

	# The default openjdk 1.8 garbage collector allows itself to stop all
	# operations for indefinite amounts of time, causing cluster issues after
	# a while.
	# Use G1GC instead, it has more realtime performance characteristics and
	# will help to keep the cluster green and responsive
	-XX:+UseG1GC 

	# Comment out or remove -XX CMS options

Change etc/elasticsearch/elasticsearch.yml

	# Store indexes in /srv/elasticsearch
	path.data:	/srv/elasticsearch

	# Since the es cluster is mostly used as an archive and few
	# search requests are being made daily, increase available
	# memory for indexing. (default is 10% of heap size)
	indices.memory.index_buffer_size: 50%

3. install logstash on one of the new nodes

We should be using a dedicated hardware node but budget is tight
and performance requirements low.
Having logstash running on one of the Elasticsearch nodes shouldn't
cause many issues with the current workload as of 2018-03.

Logstash 6.x is the new version
We need to use it for new indexes in order to avoid compatibility
issues with Logstash/Journalbeat 5.x and Elasticsearch 6.x

Configure it to send data to the Elasticsearch instance on Banco

4. Configure Journalbeat on Banco to send data to the new logstash instance

Edit banco:/etc/journalbeat Change the hosts: parameter

5. Configure the new Elasticsearch nodes in a cluster with Banco

Change etc/elasticsearch.yml

cluster.name:	swh-logging-prod
node.name:	esnode${node}
network.host:	192.168.100.${node_last_ip_digit}
discovery.zen.ping.unicast.hosts: [
	"banco.internal.softwareheritage.org",
	"esnode0.internal.softwareheritage.org",
	"esnode1.internal.softwareheritage.org",
	"esnode2.internal.softwareheritage.org"
]

# We have three nodes, avoid split-brains
discovery.zen.minimum_master_nodes: 2

This parameter is extremely important. Without it, we can get split-brains.

Wait for the cluster to turn green

6. Configure the cluster

# Limit the maximum amount of I/Os dedicated to Lucene segment mergings
# (because we use slow HDDs)
curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
	"index.merge.scheduler.max_thread_count" : "1"
}'

7. Configure the new logstash instance to send data to the new nodes

Some load-balancer / error detection system could be useful if the cluster
gets big.
For three nodes, hardcoding all of them should be fine.

Logstash outputs accepts a list of hosts, this should be enough for
a small cluster.

8. Remove Banco from the cluster

Turn off Elasticsearch on Banco Get back former Elasticsearch disk space

Do not do this until the cluster gets green

9. Reindex old logstash-* indexes

Details in swh/infra/sysadm-environment#1000 (closed) .

Just a reminder that those machines will not be dedicated to elasticsearch: they are also intended to be used to setup a kafka cluster for the journal / mirroring system.

The only adaptations I see from your plan are:

make a LVM physical volume on top of the RAID0 so we can manage partitions on the fly
don't allocate all the space on the RAID0 for the elasticsearch cluster (start with 25% or so?)
don't allocate all the RAM for elasticsearch (again, probably start with 25% of the nodes' ram for the JVM heap)

This ticket is about an Elasticsearch cluster, any other service is out of scope here.

Besides, the 1U machines we just received were budgeted for Elasticsearch. Since nothing is known about resource requirements of an eventual Kafka service, the hardware was only tailored according to Elasticsearch needs.

Given the excessive prices of Dell SSD options, the new servers are also using old-school HDDs, which severely impacts I/O performance under load. Elasticsearch requires as much memory as possible in order to compensate for the slow disks and not degrade operations too much.

Why should another service be allowed to run on the same machines ? I suggest opening a separate ticket for Kafka and budget the required hardware.

Some of the test nodes exhibited memory leak symptoms. It seems they were related to the use of mmap() to access files. Adding "index.store.type: niofs" in elasticsearch.yml seemed to fix this particular problem.

Adding swh/infra/sysadm-environment#1017 (closed) since there is no choice but to use the same underlying hardware for both Kafka and Elasticsearch.

Elasticsearch disk requirements should thus be modified to only use 3 of the 4 disks in a RAID0 volume.

Two new cluster nodes have been added to the swh-logging-prod cluster: esnode1 and esnode2.internal.softwareheritage.org. Due to the Kafka requirement, only three disks in RAID0 are used per new node.

The cluster now has three brand new nodes; all existing data has been copied and the original Banco node removed. Issue technically fixed but left open in order to track related Elasticsearch requests.

assigned to @ftigeot

added state:wip label

Removed swh/infra/sysadm-environment#1017 (closed) Kafka subtask, it really has no relation to the Elasticsearch cluster being a true cluster or not.

Task done since 2018-05.

removed state:wip label

closed