It’s official: ElasticSearch is the number one enterprise search engine. On top of a skyrocketing popularity, ElasticSearch seems to gain new converts daily. The “ELK” stack combines the visualizing data features of ElasticSearch with the logging and processing tools of Logstash and Kibana. In 2013, Database Zone began measuring ElasticSearch’s stirring popularity, tracking the search engines referrals in job offers, LinkedIn profiles, Google Trends, and StackOverflow.
At the beginning of its tracking period, Database Zone assigned ElasticSearch a popularity score of 7.4. By July of 2017, they recorded the engine at 115.98 – a staggering, 15-fold increase in its ranking.
With its growing popularity and high rate of adoption, if you’re not yet using ElasticSearch, chances are you will be soon. Your interest in this article is an excellent start to comprehending and mastering its capabilities.
So, how do you navigate and make sense of this powerful tool? In this post, we’ll cover precisely that – the metrics critical to ElasticSearch you should search for and learn.
Node performance encompasses a few different areas, which we’ll examine here. CPU – When CPU usage spikes, it is wise to examine Java Virtual Machine (JVM) metrics. Memory usage/Disk I/O – For those who have never closely examined a memory chart, a display reading no free memory might be alarming. Many people assume zero available memory indicates the server does not have sufficient RAM, but in reality, it merely means your RAM is being used sufficiently.
Charts illustrating a “full memory” are quite common. Search engines are also heavy users of storage devices, such as disks. Disk I/O can serve as an indicator metric if issues in performance, capacity, or processing speeds arise.
Cluster health status, not unlike OS metrics for servers, is a fundamental metric for ElasticSearch. Cluster health status offers data on running nodes, as well as the status of shards being distributed to the nodes.
As ElasticSearch is a Java application, it is critical to obtain optimal settings for JVM, as well as consistently monitor the garbage collector and operating system memory usage.
Specifically, be sure to watch garbage collection patterns and trends to ensure the cluster is starting and functioning as it should and that garbage collection is cycling properly.
Naturally, search performance is a key metric to watch. Analytical queries are a common use for ElasticSearch, and a variety of factors can impact the queries’ performance. These factors include improper configuration of the ElasticSearch cluster and inadequate queries, as well as some of the issues mentioned above like JVM memory, garbage collection and disk IO.
As query latency has such a significant impact on users, it is important that it is closely monitored and alerts are setup to monitor it.
Other factors that are important for search performance include field data cache size and evictions, especially if aggregation queries are in place. While field data is expensive to build – the process of pulling data from disk into memory contributing to the high cost – it is an integral component of search performance.
In ElasticSearch, there are a lot of moving parts in the indexing process. This also means that various metrics exist to track performance.
In the article, 10 ElasticSearch metrics to watch, Stefan Thies states, “When running indexing benchmarks, a fixed number of records is typically used to calculate the indexing rate.” Specifically, refresh times and merge times are good indicators to watch when analyzing indexing performance as “they affect overall cluster performance.”
Want to learn more?
If you liked this post and want to delve further into some of the key metrics to watch in ElasticSearch, check out Stefan Thies’ blog post for a more in-depth look at what should be on your dashboard.
The expertise to guide you in metrics, the systems to keep you up and running
At Weblink Technologies, we speak ElasticSearch and we know metrics. We also offer clients peace of mind by applying safeguards to ensure that users cannot bring down the stack – a critical loophole to be aware of if you’ve deployed your own stack.