Search engines are an integral part of our daily lives.
Most of us are familiar with ‘Google’. How to bake a cake? Where does my favorite actor live? Who wrote this book? What are the latest trends in fashion? And more questions are answered by our friendly ‘Google’.
Google is one of the many search engines available today which ‘dig’ around the Internet, and present us with the most relevant and valuable information.
Let us now understand, how do these search engines work?
Basically all search engines go through three stages:
- Ranking and Retrieval
This stage involves scanning the sites and obtaining information about everything that is contained there: page title, keywords, layout, pages that it links to – at a bare minimum.
This task is performed by special software robots, called “spiders” or “crawlers”.
These robots usually start with the most heavily used servers and popular web pages. The link structure is very important to determine the route that these “crawlers” follow. The new links are followed next to find many interconnected documents, also revisiting the previous sites to check for newly made changes. A never-ending process.
Sometimes the “crawlers” give up, if the actual content is hidden many clicks away from the homepage.
Once all the data has been assimilated, selected pieces of it are stored in huge storage facilities. We can relate in this way: we possess several number of books. Going through all of it is the crawling, and making a list of them, along with their authors and other related information is the indexing.
This example provides a small-scale view.
If we expand this assumption to books contained in all the libraries in this world, that pretty much explains the magnitude a search engine undertakes.
Ranking and Retrieval
Search engines are answer machines. Whenever we perform an online search, the search engines scour its database for the most relevant results. Also, it ranks these results based on the popularity of the websites. Relevance and popularity are the most important factors to be considered by these search engines to provide satisfactory performance.
Ranking algorithms differ for different search engines. An engine might assign a weight to each entry, relative to their appearance in the title, meta tags or the sub-headings.
The most basic algorithm uses the frequency of the keyword being searched. This, however, led to something called “keyword stuffing”, where the pages are mostly filled with nonsense as long as it includes the keyword.
This gave way to the concept based on linking – more popular sites would be linked more.
At present, search engines are trying to develop for natural language queries. Being able to understand what we speak, in a free manner, will truly revolutionize this technology. One popular natural language query site today is AskJeeves.com, but it prefers simple queries. Time would give rise to better search engines that accept complex queries.
This article is contributed by Nihar Ranjan Sarkar. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.