Introduction to Concept Drift

Last Updated : 02 Sep, 2020

If we place ourselves in a frame that differs slightly from what we usually see. For instance: when we do batch learning, i.e. learning on a fixed set of data that generates a given model, algorithms can quickly become ineffective or even counterproductive. This problem could occur because of the modification of data or the occurrence of new data constantly. This problem is known as concept drift.

A formal definition:
Concept drift is the event where the statistical properties of the class variable of the data — in other words, the target we want to predict — change over time. When a model is trained, it knows a function that maps the independent variables, or predictors, to the target variables. In other words, predicting the target variable with the help of other independent variables. In a static and perfect environment where none of these predictors nor the target changes or evolves, the model should perform as it did on day one because there’s no change. But if the predictors are changed with time, the model might change the performance, as it was trained with old data, and predicting from new data might be tough for the model because of the evolution of the predictors.
An example of such a situation is Dynamic Data (For Instance: Streaming Data), where not only do the statistical properties of the target variable change but so does its meaning. When this change happens, the mapping found by the function is no longer suitable for the new environment.

In Machine Learning and Predictive analytics, the concept drift means the statistical properties of the target variable of the data, of which the model is trying to predict, changes over time in very unpredicted ways. This leads to problems because as time passes, the predictions become less accurate. Hence of little or no use.

Let’s illustrate an example of a sensor positioned on a volcano in order to collect the temperature of the latter over time. Suppose that we collect data over several days during which it only rained. Learning about these data would allow us to obtain the following model (figure below): beyond a certain threshold, we consider that the volcano is active and if not, it is at rest.

Figure 1: Data during Rain

However, a few days later, a heatwave arrives and the temperature distribution is found changed as below (Figure 2). We can easily see that the model established earlier is no longer valid, you have to adapt it.

Figure 2: Data After Rain

We can also see the concept of concept-drift in shopping during Diwali in India. During normal days shopping goes very normally but suddenly during the time of Diwali, the shopping hikes very sudden. Below are the few statistics that are taken from here.