The current elasticsearch logging cluster is configured by hand on a single machine, banco.
We should make that an actual cluster with at least a replica.
Current data usage is 470GiB of logs spanning six months of history (february 2017 - now), which means we probably want to size it around 1 TiB of storage space to account for growth.
The machines have 4x 2TB HDDsPartitions:- RAID 1 / 40GB- RAID 0 /srv/elasticsearch all the restUsing a RAID 0 volume will improve I/O performanceWe don't need redundancy on /srv/elasticsearch sinceElasticsearch itself will provide itName the hosts with easy to remember names in the DNS like "esnode0.internal.softwareheritage.org"
2. Install Elasticsearch on the new nodes
[This stage could be puppetized]The new hosts have 32GB of RAM.Change etc/elasticsearch/jvmoptions : # Use a maximum of 50% of available memory for the jvm heap # Keep it <32GB in order to benefit from jvm pointer compression # Around 50% of memory should be kept free for disk caching purposes -Xms15g -Xmx15g # The default openjdk 1.8 garbage collector allows itself to stop all # operations for indefinite amounts of time, causing cluster issues after # a while. # Use G1GC instead, it has more realtime performance characteristics and # will help to keep the cluster green and responsive -XX:+UseG1GC # Comment out or remove -XX CMS optionsChange etc/elasticsearch/elasticsearch.yml # Store indexes in /srv/elasticsearch path.data: /srv/elasticsearch # Since the es cluster is mostly used as an archive and few # search requests are being made daily, increase available # memory for indexing. (default is 10% of heap size) indices.memory.index_buffer_size: 50%
3. install logstash on one of the new nodes
We should be using a dedicated hardware node but budget is tightand performance requirements low.Having logstash running on one of the Elasticsearch nodes shouldn'tcause many issues with the current workload as of 2018-03.Logstash 6.x is the new versionWe need to use it for new indexes in order to avoid compatibilityissues with Logstash/Journalbeat 5.x and Elasticsearch 6.xConfigure it to send data to the Elasticsearch instance on Banco
4. Configure Journalbeat on Banco to send data to the new logstash instance
Edit banco:/etc/journalbeat
Change the hosts: parameter
5. Configure the new Elasticsearch nodes in a cluster with Banco
Change etc/elasticsearch.ymlcluster.name: swh-logging-prodnode.name: esnode${node}network.host: 192.168.100.${node_last_ip_digit}discovery.zen.ping.unicast.hosts: [ "banco.internal.softwareheritage.org", "esnode0.internal.softwareheritage.org", "esnode1.internal.softwareheritage.org", "esnode2.internal.softwareheritage.org"]# We have three nodes, avoid split-brainsdiscovery.zen.minimum_master_nodes: 2This parameter is extremely important. Without it, we can get split-brains.Wait for the cluster to turn green
6. Configure the cluster
# Limit the maximum amount of I/Os dedicated to Lucene segment mergings# (because we use slow HDDs)curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{ "index.merge.scheduler.max_thread_count" : "1"}'
7. Configure the new logstash instance to send data to the new nodes
Some load-balancer / error detection system could be useful if the clustergets big.For three nodes, hardcoding all of them should be fine.Logstash outputs accepts a list of hosts, this should be enough fora small cluster.
8. Remove Banco from the cluster
Turn off Elasticsearch on Banco
Get back former Elasticsearch disk space
Just a reminder that those machines will not be dedicated to elasticsearch: they are also intended to be used to setup a kafka cluster for the journal / mirroring system.
The only adaptations I see from your plan are:
make a LVM physical volume on top of the RAID0 so we can manage partitions on the fly
don't allocate all the space on the RAID0 for the elasticsearch cluster (start with 25% or so?)
don't allocate all the RAM for elasticsearch (again, probably start with 25% of the nodes' ram for the JVM heap)
This ticket is about an Elasticsearch cluster, any other service is out of scope here.
Besides, the 1U machines we just received were budgeted for Elasticsearch.
Since nothing is known about resource requirements of an eventual Kafka service, the hardware was only tailored according to Elasticsearch needs.
Given the excessive prices of Dell SSD options, the new servers are also using old-school HDDs, which severely impacts I/O performance under load.
Elasticsearch requires as much memory as possible in order to compensate for the slow disks and not degrade operations too much.
Why should another service be allowed to run on the same machines ?
I suggest opening a separate ticket for Kafka and budget the required hardware.
Some of the test nodes exhibited memory leak symptoms. It seems they were related to the use of mmap() to access files.
Adding "index.store.type: niofs" in elasticsearch.yml seemed to fix this particular problem.
Two new cluster nodes have been added to the swh-logging-prod cluster: esnode1 and esnode2.internal.softwareheritage.org.
Due to the Kafka requirement, only three disks in RAID0 are used per new node.
The cluster now has three brand new nodes; all existing data has been copied and the original Banco node removed.
Issue technically fixed but left open in order to track related Elasticsearch requests.