Data Mining and Recommender Systems
Data mining makes use of various methodologies in statistics and different algorithms, like classification models, clustering, and regression models to exploit the insights which are present in the large set of data. It helps us to predict the outcome based on the history of events that have taken place. For example, the amount a person spends on a monthly basis based on his previous transactions, the frequent items which are bought by the customers, like bread, butter, and jam, are always bought together. The trends in the market can also be analyzed, like the demand for umbrellas during the rainy season and the demand for ice cream during the summer. The main objective here is to analyze the pattern present in the data set and obtain useful information based on the target required.
What could be the yield of the crops in the present year? What are the chances of a person having a particular disease when all the symptoms are given? What is the expected sale of groceries in a particular month? What is the expected number of customers purchasing clothes from a particular supermarket? What is the loss/ profit percentage expected in the coming year? All these questions can be answered provided we use an accurate model for training the data, identify the patterns present in the datasets, and more importantly, we need to have a sufficient amount of data to arrive at accurate and efficient results.
In particular, data processing attracts upon ideas, like sampling, estimation, and hypothesis testing from statistics and search algorithms, modeling techniques, and learning theories from computing, pattern recognition, and machine learning.
The recommender system mainly deals with the likes and dislikes of the users. Its major objective is to recommend an item to a user which has a high chance of liking or is in need of a particular user based on his previous purchases. It is like having a personalized team who can understand our likes and dislikes and help us in making the decisions regarding a particular item without being biased by any means by making use of a large amount of data in the repositories which are generated day by day. The aim of recommender systems is to supply simply accessible, high-quality recommendations for the user community. Its wish is to own a reasonable personal authority with efficiency.
Which movie/ web series should I watch next? Which book should I read next? Which items should I buy which would match the items bought earlier? Which are the magazines that I should be reading? Will that match with the genre I like? Should I go to a particular place? Will I like that? All these questions can be answered with the help of the recommender system.
Here what we do is find the similarity of the users or items from whom the recommendation has to be made with that of all the users or items which are present in the datasets. We find the pattern of likes and dislikes having the highest similarity. Then we make use of that pattern to suggest whether an item or place or movie or book has to be suggested or not.
- User-based recommendation: Here we calculate Pearson’s similarity measure, which is needed to determine the closely related users, i.e, whose likes and dislikes follow the same pattern. The computational operations are based on the formula of Pearson similarity. The ratings of two different users are subtracted by the mean value and multiplied in the numerator and in the denominator, the ratings are squared and summation is calculated for each. After getting the summation values, they are divided to get the similarity measure.
- Item-based recommendation: Initial aim is to obtain the mean adjusted matrix. The mean adjusted matrix is used in the prediction of the rating of a new user using the item, based on reducing the errors caused by the users, as some tend to give very high ratings most of the time and some tend to give very low ratings most of the time. So, to reduce this inconsistency, we subtract the mean value from each of the users. The next step is the calculation of the similarity measure between the items. Here we can make use of the cosine similarity matrix. The computational operations are based on the formula of cosine similarity. The ratings of different users on two items are multiplied in the numerator and in the denominator, the ratings are squared and a summation is calculated for each. After getting the summation values, they are divided to get the similarity measure.
In the above two methods, we get the similarity measure based on which we make the prediction of whether the item has to be suggested or not to a particular user or whether the item is relevant or not.
The ways for selecting the simplest technique supported the specifics of the appliance domain, distinguishing cogent success factors behind totally different techniques, or examination of many techniques supported associated optimal criterion area unit all needed for effective analysis. Recommender systems have historically been evaluated by exploitation offline experiments that plan to estimate the prediction error of recommendations exploitation associate existing dataset of transactions.