Python – Removing Constant Features From the Dataset
Last Updated :
02 Sep, 2020
Those features which contain constant values (i.e. only one value for all the outputs or target values) in the dataset are known as Constant Features. These features don’t provide any information to the target feature. These are redundant data available in the dataset. Presence of this feature has no effect on the target, so it is good to remove these features from the dataset. This process of removing redundant features and keeping only the necessary features in the dataset comes under the filter method of Feature Selection Methods.
Now Let’s see how we can remove constant features in Python.
Consider the self created dataset for the article:
Portal |
Article’s_category |
Views |
GeeksforGeeks |
Python |
545 |
GeeksforGeeks |
Data Science |
1505 |
GeeksforGeeks |
Data Science |
1157 |
GeeksforGeeks |
Data Science |
2541 |
GeeksforGeeks |
Mathematics |
5726 |
GeeksforGeeks |
Python |
3125 |
GeeksforGeeks |
Data Science |
3131 |
GeeksforGeeks |
Mathematics |
6525 |
GeeksforGeeks |
Mathematics |
15000 |
Code: Create DataFrame of the above data
import pandas as pd
data = pd.DataFrame({ "Portal" :[ 'GeeksforGeeks' , 'GeeksforGeeks' , 'GeeksforGeeks' , 'GeeksforGeeks' , 'GeeksforGeeks' ,
'GeeksforGeeks' , 'GeeksforGeeks' , 'GeeksforGeeks' , 'GeeksforGeeks' ],
"Article's_category" :['Python ', ' Data Science ', ' Data Science ', ' Data Science ', ' Mathematics',
'Python' , 'Data Science' , 'Mathematics' , 'Mathematics' ],
"Views" :[ 545 , 1505 , 1157 , 2541 , 5726 , 3125 , 3131 , 6525 , 15000 ]})
|
Code: Convert the categorical data to numerical data
from sklearn.preprocessing import OrdinalEncoder
ord_enc = OrdinalEncoder()
data[[ "Portal" , "Article's_category" ]] = ord_enc.fit_transform(data[[ "Portal" , "Article's_category" ]])
|
Code: Fit the data to VarianceThreshold.
from sklearn.feature_selection import VarianceThreshold
var_threshold = VarianceThreshold(threshold = 0 )
var_threshold.fit(data)
print (var_threshold.variances_)
|
Output: Variance of different features:
[0.00000000e+00 6.17283951e-01 1.76746269e+07]
Code: Transform the data
print (var_threshold.transform(data))
print ( '*' * 10 , "Separator" , '*' * 10 )
print ( "Earlier shape of data: " , data.shape)
print ( "Shape after transformation: " , var_threshold.transform(data).shape)
|
Output:
[[2.000e+00 5.450e+02]
[0.000e+00 1.505e+03]
[0.000e+00 1.157e+03]
[0.000e+00 2.541e+03]
[1.000e+00 5.726e+03]
[2.000e+00 3.125e+03]
[0.000e+00 3.131e+03]
[1.000e+00 6.525e+03]
[1.000e+00 1.500e+04]]
********** Separator **********
Earlier shape of data: (9, 3)
Shape after transformation: (9, 2)
As you can observe earlier we had 9 observations with 3 features.
After transformation we have 9 observations with 2 features. We can clearly observe that the removed feature is ‘Portal’.
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...