Tag Archives: Elasticsearch

Recovering from full Elasticsearch nodes

Recently I ran out of space on a 5 node Elasticsearch cluster. Events were not being indexed, and Logstash had amassed a 10GB disk-backed queue. It was not pretty

I discovered that the fifth node was configured incorrectly and was storing the ES data on one of the smaller disk partitions. I stopped the Elasticsearch service on this node while I formulated a plan.

Unfortunately, I didn’t have the time (or confidence) to move the entire /var directory to the large partition (which happened to be serving the /home folder: mounted as /dev/mapper/centos-home), so I instead created a new folder at /home/elasticsearch (so it would be on the large partition), and “symlinked”/var/elasticsearch to the new home folder on the larger partition ln -s /home/elasticsearch/elasticsearch /var/lib/elasticsearch

After creating the Symlink, I started the Elasticsearch service, and watched the logs. After some time, I noticed that there were still no primary shards assigned to this new nodes (despite it being the only node with disk space utilization below the threshold), so I dug in a bit more

This is where I learned about /_cluster/allocation/explain which provides details about why certain shards may have an allocation problem. Ah ha! After 5 failed attempts to unassigned shards to my new node, Elasticsearch just needed a little kick to re run the allocation process: I opened up the Kibana console, and ran POST /_cluster/reroute?retry_failed=true to force the algorithm to re-evaluate the location of shards

Within about 90 seconds, the Elasticsearch cluster began rerouting all of the unassigned shards, and my logstash disk-queue began to shrink as the events poured into the freshly allocated shards on my new node.

Problem solved.

Stay tuned for next week when I pay off the technical debt incurred by placing my Elasticsearch shards on a symlink 😬