Precision and Recall in Information Retrieval
Information Systems can be measured with two metrics: precision and recall. When a user decides to search for information on a topic, the total database and the results to be obtained can be divided into 4 categories:
- Relevant and Retrieved
- Relevant and Not Retrieved
- Non-Relevant and Retrieved
- Non-Relevant and Not Retrieved
Relevant items are those documents that help the user in answering his question.Non-Relevant items are items that don’t provide actually useful information. For each item there are two possibilities it can be retrieved or not retrieved by the user’s query. Precision is defined as the ratio of the number of relevant and retrieved documents(number of items retrieved that are actually useful to the user and match his search need) to the number of total retrieved documents from the query.
Recall is defined as ratio of the number of retrieved and relevant documents(the number of items retrieved that are relevant to the user and match his needs) to the number of possible relevant documents(number of relevant documents in the database).Precision measures one aspect of information retrieval overhead for a user associated with a particular search. If a search has 85 percent precision then, then 15(100-85) percent of user effort is overhead reviewing non-relevant items.
Recall measures to what extent a system processing a particular query is able to retrieve the relevant items the user is interested in seeing. Recall is a very useful concept but due to the denominator is non-calculable in operational systems. If the system is made known the total set of relevant items in the database, recall can be made calculable.
Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.