Feature selection is also known as attribute selection is a process of extracting the most relevant features from the dataset and then applying machine learning algorithms for the better performance of the model. A large number of irrelevant features increases the training time exponentially and increase the risk of overfitting.
Chi-square Test for Feature Extraction:
Chi-square test is used for categorical features in a dataset. We calculate Chi-square between each feature and the target and select the desired number of features with best Chi-square scores. It determines if the association between two categorical variables of the sample would reflect their real association in the population.
Chi- square score is given by :
Observed frequency = No. of observations of class
Expected frequency = No. of expected observations of class if there was no relationship between the feature and the target.
Python Implementation of Chi-Square feature selection:
Original feature number: 4 Reduced feature number : 2
- Chi-Square Test for Feature Selection - Mathematical Explanation
- Parameters for Feature Selection
- ML | Extra Tree Classifier for Feature Selection
- ML | Feature Mapping
- ML | Feature Scaling – Part 2
- ML | Feature Scaling - Part 1
- Python | How and where to apply Feature Scaling?
- Python | Data Comparison and Selection in Pandas
- Python | Selective value selection in list of tuples
- Kolmogorov-Smirnov Test (KS Test)
- ML | Kolmogorov-Smirnov Test
- Python | Skipping Test Failures
- Python | Test for False list
- Turing Test in Artificial Intelligence
- Python | Test list element similarity
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.