What is Elastic Stack and Elasticsearch?
Elastic Stack is a group of products that can reliably and securely take data from any source, in any format, then search, analyze, and visualize it in real-time. Elasticsearch is a distributed, RESTful search and analytics engine that can address a huge number of use cases. Also considered as the heart of the Elastic Stack, it centrally stores user data for high-efficiency search, excellent relevancy, and powerful analytics that is highly scalable.
Core Products of Elastic Stack
The core products that define an Elastic stack are listed below:
- Elastic Search – Search and analytics engine.
- Logstash – Data processing pipeline.
- Kibana – Dashboard to visualize data.
All these three have their own significance and by combing these three you’ll get analysis and analytics of your data.
Why it is needed?
As per the survey, Facebook generates 4 Petabytes data every day i.e 40 million GB. The Data, Now it’s a world of data. So We need a system that analyzes our data. There are two terms to understand:
- Analysis – In the analysis part, You’ll get results from the past data or the existing data that you have.
- Analytics – When you want to predict user requirements, You want graphs based visualization for better business clarity and also want to understand Data patterns.
So these two most important tools for any business. You can achieve these by your Data. And with the help of these two, you can grow your business and clear business insights. Now, it’s How? Because to analyze this large data in less amount of time is not an easy task.
Challenges and Solutions:
- What happens in very large companies you get data from different places in different formats. It can be JSON or XML whatever. So we need one mechanism to get whole data in one place and also in one format. So for that, we use Logstash.
- Now when we get data we need to arrange data in a systematic order, so we can evaluate the things very easily. Also, we want to analyze the data, in that case, First, go through with data very quickly. For that we have Elasticsearch. Elasticsearch is developed in Java and is released as open-source under the terms of the Apache License.
- Now after completing this, we need a visualization platform where we can show our data analytics. There Kibana comes into the picture. That is how the whole Elastic stack worked. For better Business insights.
Setting up Elasticsearch, Logstash, and Kibana
At first let’s download the three open-source software from their respective links [elasticsearch], [logstash], and [kibana]. Unzip the files and put all three in the project folder. Firstly, set up Kibana and Elasticsearch on the local system. We run Kibana by the following command in the bin folder of Kibana.
Similarly, Elasticsearch is set up like this:
Now, in the two separate terminals, we can see both of the modules running. In order to check that the services are running open localhost:5621 for Kibana and localhost:9600 for Elasticsearch.
Here, we are ready with set up for elastic stack. Now go to localhost:5621 and open dev tools here in the console. It is the place where you can write Elasticsearch queries. As we will talk more on Elasticsearch this time. Now we’ll see how exactly Elasticsearch Works.
Working Of Elastic Search
Before any operation, we have to index our Data. Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their data. Elasticsearch stores data as JSON documents and uses Data structure as called an inverted index, which is designed to allow very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in. For a better understanding, we’ll divide Elasticsearch into several topics.
- Managing Documents
- Search Methodology
- Aggregation and Filters
Architecture of Elastic Search:
- Cluster: In Elasticserach, we store our data in nodes, there can be n number of nodes in a machine. And each node is related to the cluster. So the Cluster is a set of nodes.
- Documents: You store your data as documents which are JSON objects. So how these data organized in the cluster? The answer is indices. In the world of relational databases, documents can be compared to a row in a table.
- Index: Elasticsearch Indices are logical partitions of documents and can be compared to a database in the world of relational databases.
- Types: Each index has one or more mapping types that are used to divide documents into a logical group. It can be compared to a table in the world of relational databases.
Every document is stored as an index. The index you can say is the collection of documents. That has similar characteristics for instance, the Department will have A index, and Employees have B index i.e they are logically related.
a) Sharding is just a way to divided index into smaller pieces.
b) Each piece is known as a shard.
c) Sharding is done at an index level.
Shard is just like an index. For scalability. With sharding, you can store billions of documents within the one index. There are also Replicas as well but for now, it is well enough for us to start and understand Elasticsearch. So let’s move further towards building and search engine.
1. Managing Documents
Before that, get the Elasticserach package manager.
npm -i elasticsearch
Step 1: Link Your application to Elasticsearch by following.
Step 2: Create Index for an eg We create an index as gov.
Step 3: Now we will add documents to index gov and in index gov, there is a type called constituencies. You can relate as there is a database called gov and the table is constituencies.
Mapping is the process of defining document, and its fields. Just like defining table-schema in RDBMS.
Step 4: Now we will define mappings to index gov type constituencies.
Text analysis is the process of converting unstructured text, like the body of an email or a product description, into a structured format that’s optimized for search. Elasticsearch performs text analysis when indexing or searching text fields. That we have defined in mappings. This is the key factor for the Search-engine.
By default, Elasticsearch uses the standard analyzer for all text analysis. The standard analyzer gives you out-of-the-box support for most natural languages and use cases. If you choose to use the standard analyzer as-is, no further configuration is needed. You can also create your own custom analyzer.
4. Search Methodology
There are different types of queries that you can apply to Elasticsearch. By that, you will get results accordingly. Here I’ll give a basic example of a query. Simplest query, which matches all documents.
- Compound queries: Compound queries wrap other compound or leaf queries, either to combine their results and scores, to change their behavior, or to switch from query to filter context.
The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses. The must and should clauses have their scores combined.
- Full-text queries: The full-text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing. It will analyze your input. If the given input is not exact but still, you’ll get a result.
- Joining queries: Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive. Instead, Elasticsearch offers two forms of join which are designed to scale horizontally.
a) nested query
b) has_child and has_parent queries
- Specialized queries: This group contains queries which do not fit into the other groups, It’s found that documents which are similar in nature, pinned queries also there are many more please check out its documentation.
- Term-level queries: You can use term-level queries to find documents based on precise values in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs.
Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field. It will find the exact match of input whereas in full-text first it will be analyzed then search so that is a big difference between Term-level and Full-text query.
6. Aggregation and Filters
In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No – no scores are calculated. Filter context is mostly used for filtering structured data, e.g.
- Does this timestamp fall into the range of 2015 to 2016?
- Is the status field set to “published”?
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance. Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation. With aggregation is more like as it is in RDBMS you will find Avg, Sum, and much data insights using complex queries.
Elastic Stack is a very important Tech to learn. You will apply this in any of your projects and the ELK Stack is most commonly used as a log analytics tool. Its popularity lies in the fact that it provides a reliable and relatively scalable way to aggregate data from multiple sources, there are still Many things remain but after this, you can start with Elasticsearch.