Open In App

Attribute Subset Selection in Data Mining

Last Updated : 12 Nov, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Attribute subset Selection is a technique which is used for data reduction in data mining process. Data reduction reduces the size of data so that it can be used for analysis purposes more efficiently.

Need of Attribute Subset Selection

The data set may have a large number of attributes. But some of those attributes can be irrelevant or redundant. The goal of attribute subset selection is to find a minimum set of attributes such that dropping of those irrelevant attributes does not much affect the utility of data and the cost of data analysis could be reduced. Mining on a reduced data set also makes the discovered pattern easier to understand.

Process of Attribute Subset Selection

The brute force approach can be very expensive in which each subset (2^n possible subsets) of the data having ‘n’ attributes can be analyzed. The best way to do the task is to use the statistical significance tests such that best (or worst) attributes can be recognized. Statistical significance test assumes that attributes are independent of one another. This is a kind of greedy approach in which a significance level is decided (statistically ideal value of significance level is 5%) and the models are tested again and again until p-value (probability value) of all attributes is less than or equal to the selected significance level. The attributes having p-value higher than significance level are discarded. This procedure is repeated again and again until all the attribute in data set has p-value less than or equal to the significance level. This gives us the reduced data set having no irrelevant attributes.

Methods of Attribute Subset Selection

1. Stepwise Forward Selection. 2. Stepwise Backward Elimination. 3. Combination of Forward Selection and Backward Elimination. 4. Decision Tree Induction. All the above methods are greedy approaches for attribute subset selection.

Stepwise Forward Selection: This procedure start with an empty set of attributes as the minimal set. The most relevant attributes are chosen (having minimum p-value) and are added to the minimal set. In each iteration, one attribute is added to a reduced set.

    Begin

    Stepwise Forward selection

    Stepwise Backward Elimination

    Here all the attributes are considered in the initial set of attributes. In each iteration, one attribute is eliminated from the set of attributes whose p-value is higher than significance level.

    Stepwise-backward

    Stepwise Backward selection

    Combination of Forward Selection and Backward Elimination

    The stepwise forward selection and backward elimination are combined so as to select the relevant attributes most efficiently. This is the most common technique which is generally used for attribute selection.

    combo-(1)

    Combination of Forward selection and Backward selection

    Decision Tree Induction

    This approach uses decision tree for attribute selection. It constructs a flow chart like structure having nodes denoting a test on an attribute. Each branch corresponds to the outcome of test and leaf nodes is a class prediction. The attribute that is not the part of tree is considered irrelevant and hence discarded.


    Like Article
    Suggest improvement
    Previous
    Next
    Share your thoughts in the comments

    Similar Reads