Blog details

7.0.0 Beta 1, the latest version of Elasticsearch, was released on 14th February 2019. This version introduced significant changes and improvements over its predecessors, of which three are worth paying particular attention to. In this article we will study these features in detail and talk about how the new functionalities can be incorporated into your Elasticsearch projects.

Two of these new features can be classified under the “queries-related features” category. The rest are typeless mappings that turn out to be the real game changer. But let’s start with the Queries Enhancement features.

Two of these new features can be classified under the “queries-related features” category. The rest are typeless mappings that turn out to be the real game changer. But let’s start with the Queries Enhancement features.

Elasticsearch 7.0 Based Queries-Related Feature

Scoring

Those Elasticsearch users who have in-depth knowledge of the DSL query language might have encountered the very popular document scoring algorithm. This type of feature is normally accomplished by the Function Score query, but in Elasticsearch 7.0, Function Score 2.0 has been introduced. The key difference of Function Score 2.0 is its component-based approach, which can usually only be found in the Function Score query base. Many features of Function Score can be bolted directly in your main query without wrapping your entire query into the Function Score. For example, Script scoring has its own query which allows users to specify custom functions for measuring the relevance scores.

GET /_search
{
  "query": {
    "script_score": {
      "query": {
        "match": {
          "message": "elasticsearch"
        }
      },
      "script": {
        "source": "doc['likes'].value / 10 "
      }
    }
  }
}

As you may know, when Elasticsearch needs to find a document in response to a query, it’s possible for a lot of matched documents show up. For example, assume that we have three documents in our index. Each document consists only of one field: “description.” The description for each document is different, as in the below:

There are various methods to calculate the final (custom) score. For instance, it is not necessary to use the initial score at all, and the flow would be the same. Elasticsearch uses a built-in approach to calculate the scores and selects only the subset of documents relevant to the specific query. It then uses the script you provide to recalculate scores on the subset of documents. The procedure is great for getting familiar with the scripting in Elasticsearch, so that custom expressions can be written with ease.

Faster Retrieval

The next feature of Elasticsearch 7.0.0 associated with queries relates to faster retrieval of data for specific types of queries. You may be aware that the calculation of relevance scores is a time-consuming process, and there are situations where the frequency of probable matches is very high. This is especially noticeable in queries including commonly used words (such as “the”, “in”, “is”). There is a high probability that the search engine will find abundant documents with small relevance scores, because it returns results that don’t have any matching keywords except for the common words. But relevance scores still need to be computed, even for these useless results, and this takes valuable time.

Most of the time, though, users are not interested in retrieving every single match from the Elasticsearch index. Most do not browse through the second or third page of results! So for these situations, the new Elasticsearch feature comes in handy. This feature does not allow the search engine to calculate relevance scores for those matches with a low score rate. As Elasticsearch mentions in its documentation, this feature can make the search process much faster.

Surprisingly, the user can also alter the maximum number of documents (search results) for which the search engine will calculate scores. This defaults to ten thousand but can easily be changed according to each user’s needs. Consider this scenario: if your query retrieves less than ten thousand documents, its behavior will not change. But if the number of results exceeds the set maximum, search time will significantly drop compared to standard searches which calculate scores for every result, regardless of volume.

You will also notice a field called hits.total in the response object. This parameter represents the total number of matches found by the search engine using the query search feature. As an example, let’s look at the following structure:

{
"value":10000,
"relation": "gte" // or “eq”
}

Value is the total number of hits if the relation is “eq” and the bound number (10000) if the relation is “gte”. The relation field shows if there are results which were not included in this subset, including those for whom relevance scores haven’t been computed. So “gte” means that the actual number of documents is larger than the number displayed in the value field. And “eq” means that the actual number of documents is equal to the number represented by the value field.

There is a parameter in the search queries which relates to this concept, called the “track_total_hits” parameter. It can be set to true, false, or an integer. When it is set to true, the response will return the exact amount of hits received. The response returns in the value field of the hits.total object. So the relation parameter of this object will be “eq”. However, if you set track_total_hits to false, the total number of hits in the response will be unknown. Lastly, if you set track_total_hits to an integer, the corresponding number of hits will serve as the boundary.

In layman’s terms, if the relation field of the hits.total is “eq”, we have the exact number of hits. If the relation is “gte”, there are more hits than the Elasticsearch returned as a response. In this case, the number you specified in the track_total_hits will be the reflection of the value field of the hits.total.

Typeless Mappings

Last but not the least from Elasticsearch 7.0.0 are the typeless mappings. In previous versions, a single index was used to support different mappings. Now each of the mappings corresponds to a separate type of document.

For example, consider the index “football,” in which there are two types of documents: _type: football_club and _type: football_player. Each document may have additional fields, with some of them overlapping. A name may be found in both the football_club and football_player fields. And there may be circumstances where these similarly named fields have different types – one may be a number and the other a Boolean. This poses a problem, as it deteriorates the performance of the search. But with the newest Elasticsearch, mapping types will be removed from the index, thus improving its performance.

The process of removing mapping types started with the launch of Elasticsearch 5.6.0, and will be finally completed with the release of Elasticsearch 8.x. In Elasticsearch 7.0.0, the _default_ mapping type is removed. The include_type_name defaults to false, and the ability to specify types in API requests will be censured.

There are two basic implications of the changes found in Elasticsearch 7.0.0. Firstly, users must get used to creating API requests without mentioning types. And secondly, you need to design the data stored in Elasticsearch in a different way altogether. The most straightforward approach is to follow the principle “one type – one index”. This is a simple solution, which should eliminate the problem of bad search performance owing to sparse data in the indices with several mapping types. But, if you have a smaller number of documents of a particular type, it is not efficient to dedicate an entire index to it. You would be wasting an essential fragment, already limited in the cluster. A more reasonable approach would be to implement the custom type fields, instead of the built in _type. By doing so, you will still have indices with several mapping types, and as a result you may need to migrate to the new architecture using the Reindex API.

Conclusion

Elasticsearch 7.0.0 has several key improvements which allows users to create custom relevance scores, retrieve data faster and remove mapping types. These improvements come at a cost, however, and it’s important to become acquainted with the diverse behavior of Elasticsearch in effected queries. It’s also vital to understand how these changes influence the architecture of data storage, and the possible need to move to a newer system. Taking all of this into account, though, will allow users to take full.

Thinking about starting an Elasticsearch project?

Please do get in touch if you need help with Elasticsearch based projects. Our services focus on improving technical and business outcomes through Expert Elasticsearch Enterprise Project Consulting and high value ROI based Elasticsearch consulting packages.

About Weblink:

Created in 2012, Weblink provides customers and businesses with top level, specialized Elasticsearch software solutions to organizations of all sizes. Based in Chicago, Illinois, Weblink assists customers worldwide. At the forefront of the analytics industry, Weblink specializes in Custom Elasticsearch Solutions, Product Search API’s, Elasticsearch Cluster Management and Enterprise level Elasticsearch solutions. Upon inception, it has always been company culture to build partnerships alongside clients whilst providing powerful, reliable Elasticsearch solutions which enhance productivity, operational efficiency and of course, save client money. Weblink has built a strong reputation for our taking on brand-driven, complex and synergistic Elasticsearch projects. Through Weblink’s solutions, they have assisted clients in integrating innovation into infrastructures, implement strategic thinking, develop brands and overall, increase revenues. In the past 7 years Weblink have worked with major clients including Nike, Boeing to Walgreens as well as many mid-sized companies including Carsoup.com and Spokin.com, the list goes on.

Why Weblink: Highly Specialized in Elasticsearch & Boutique in Approach!

Weblink specializes by working with small teams that focus exclusively on Elasticsearch. We offer high value ROI based Consulting packages and we are structured to provide flexibility for short term and long-term needs. If you are looking for a “high value”, “high efficiency”,” high trust” approach for your organization, look no further because we will go the extra mile for you. The main reason you should work with a boutique Elasticsearch consulting firm is you will be working directly with specialists that will intimately and meticulously focus on understanding your needs.