Redundancy and Correlation in Data Mining

Prerequisites:Chi-square test, covariance-and-correlation

What is Data Redundancy ?

During data integration in data mining, various data stores are used. This can lead to the problem of redundancy in data. An attribute (column or feature of data set) is called redundant if it can be derived from any other attribute or set of attributes. Inconsistencies in attribute or dimension naming can also lead to the redundancies in data set.

Example –
We have a data set having three attributes- person_name, is_male, is_female.