Discounted Cumulative Gain
Discounted Cumulative Gain (DCG) is the metric of measuring ranking quality. It is mostly used in information retrieval problems such as measuring the effectiveness of the search engine algorithm by ranking the articles it displays according to their relevance in terms of the search keyword.
Let’s consider that a search engine that outputs 5 documents named ( D1, D2, D3, D4, D5) are output in that order. We need to define the relevence scale (0-3) where:
- 0 : not relevent
- 1-2 : somewhat relevent
- 3 : completely relevent
Suppose these documents have relevance scores:
- D1 : 3
- D2 : 2
- D3 : 0
- D4 : 0
- D5 : 1
The Cumulative Gain is the sum of these relevance scores and can be calculated as:
The discounted cumulative gain can be calculated by the formula:
Therefore the discounted cumulative gain of above example is:
Now we need to arrange these articles in descending order by rankings and calculate DCG to get the Ideal Discounted Cumulative Gain (IDCG) ranking.
Now, we calculate our Normalized DCG using the following formula :
Code : Python program for Normalized Discounted Cumulative Gain
DCG score : 4.670624189796882 IDCG score : 4.761859507142915 nDCG score : 0.980840401274087 nDCG score (from function) : 0.980840401274087
Limitations of Normalized Discounted Cumulative Gain (NDCG):
- NDCG metrics does not penalize the bad documents outputs. For Example:  and [3, 0, 0] has same NDCG but in second out there are two irrelevent documents.
- Because no specific standard defined for the number of output documents. DCG does not seem to deal with missing any relevant document in the output. For example, two outputs [3, 3, 3] and [3, 3, 3, 3] of similar input are considered equally good. For output-1 the DCG3 is calculated but for output-2 the DCG4 is calculated.
- DCG Wikipedia article
- Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.
- Multilabel Ranking Metrics-Label Ranking Average Precision | ML
- MultiLabel Ranking Metrics - Ranking Loss | ML
- ML | MultiLabel Ranking Metrics - Coverage Error
- tf-idf Model for Page Ranking
- Ranking Rows of Pandas DataFrame
- PyQt5 QSpinBox - How to get the font metrics
- Python | Similarity metrics of strings
- PyQt5 QCalendarWidget - Accessing Font Metrics
- Python - Cumulative List Split
- Python - Cumulative Records Product
- Python | CAP - Cumulative Accuracy Profile analysis
- Python | Mathematical Median of Cumulative Records
- Python - Cumulative product of dictionary value lists
- Python program to find Cumulative sum of a list
- Python | Pandas series.cumprod() to find Cumulative product of a Series
- Python | Pandas Series.cummin() to find cumulative minimum of a series
- Python | Pandas series.cummax() to find Cumulative maximum of a series
- Python | Pandas Series.cumsum() to find cumulative sum of a Series
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.