Tag Archives: Centos

Recovering from full Elasticsearch nodes

Recently I ran out of space on a 5 node Elasticsearch cluster. Events were not being indexed, and Logstash had amassed a 10GB disk-backed queue. It was not pretty

I discovered that the fifth node was configured incorrectly and was storing the ES data on one of the smaller disk partitions. I stopped the Elasticsearch service on this node while I formulated a plan.

Unfortunately, I didn’t have the time (or confidence) to move the entire /var directory to the large partition (which happened to be serving the /home folder: mounted as /dev/mapper/centos-home), so I instead created a new folder at /home/elasticsearch (so it would be on the large partition), and “symlinked”/var/elasticsearch to the new home folder on the larger partition ln -s /home/elasticsearch/elasticsearch /var/lib/elasticsearch

After creating the Symlink, I started the Elasticsearch service, and watched the logs. After some time, I noticed that there were still no primary shards assigned to this new nodes (despite it being the only node with disk space utilization below the threshold), so I dug in a bit more

This is where I learned about /_cluster/allocation/explain which provides details about why certain shards may have an allocation problem. Ah ha! After 5 failed attempts to unassigned shards to my new node, Elasticsearch just needed a little kick to re run the allocation process: I opened up the Kibana console, and ran POST /_cluster/reroute?retry_failed=true to force the algorithm to re-evaluate the location of shards

Within about 90 seconds, the Elasticsearch cluster began rerouting all of the unassigned shards, and my logstash disk-queue began to shrink as the events poured into the freshly allocated shards on my new node.

Problem solved.

Stay tuned for next week when I pay off the technical debt incurred by placing my Elasticsearch shards on a symlink 😬

Display HTTPS X509 Cert from Linux CLI

Recently, while attempting a git pull, I was confronted with the following error:

Peer's certificate issuer has been marked as not trusted by the user.

The operation worked on a browser on my dev machine, and closer inspection revealed that the cert used to serve the GitLab service was valid, but for some reason the remote CentOS Linux server couldn’t pull from the remote.

I found this post on StackOverflow detailing how to retrieve the X509 cert used to secure an HTTPS connection:

echo | openssl s_client -showcerts -servername MyGitServer.org -connect MyGitServer.org:443 2>/dev/null | openssl x509 -inform pem -noout -text

This was my ticket to discover why Git on my CentOS server didn’t like the certificate: the CentOS host was resolving the wrong DNS host name, and therefore using an invalid cert for the service.

And now a Haiku:

http://i.imgur.com/eAwdKEC.png