Open In App

Imbalanced-Learn module in Python

Imbalanced-Learn is a Python module that helps in balancing the datasets which are highly skewed or biased towards some classes. Thus, it helps in resampling the classes which are otherwise oversampled or undesampled. If there is a greater imbalance ratio, the output is biased to the class which has a higher number of examples. The following dependencies need to be installed to use imbalanced-learn:

To install imbalanced-learn just type in :



pip install imbalanced-learn

The resampling of data is done in 2 parts: 

Estimator: It implements a fit method which is derived from scikit-learn. The data and targets are both in the form of a 2D array



estimator = obj.fit(data, targets)

Resampler: The fit_resample method resample the data and targets into a dictionary with a key-value pair of data_resampled and targets_resampled.

data_resampled, targets_resampled = obj.fit_resample(data, targets)

The Imbalanced Learn module has different algorithms for oversampling and undersampling:

We will use the built-in dataset called the make_classification dataset which return 

Click dataset to get the dataset used.




# import required modules
from sklearn.datasets import make_classification
  
# define dataset
x, y = make_classification(n_samples=10000
                           weights=[0.99], 
                           flip_y=0)
print('x:\n', X)
print('y:\n', y)

Output:

Below are some programs in which depict how to apply oversampling and undersampling to the dataset:

Oversampling

Syntax:

from imblearn.over_sampling import RandomOverSampler

Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, ratio=None

Implementation:
oversample = RandomOverSampler(sampling_strategy=’minority’)
X_oversample,Y_oversample=oversample.fit_resample(X,Y)

Return Type:a matrix with the shape of n_samples*n_features

Example:




# import required modules
from sklearn.datasets import make_classification
from imblearn.over_sampling import RandomOverSampler
  
# define dataset
x, y = make_classification(n_samples=10000
                           weights=[0.99], 
                           flip_y=0)
  
oversample = RandomOverSampler(sampling_strategy='minority')
x_over, y_over = oversample.fit_resample(x, y)
  
# print the features and the labels
print('x_over:\n', x_over)
print('y_over:\n', y_over)

Output:

Syntax:

from imblearn.over_sampling import SMOTE, ADASYN

Parameters(optional):*, sampling_strategy=’auto’, random_state=None, n_neighbors=5, n_jobs=None

Implementation:
smote = SMOTE(ratio=’minority’)
X_smote,Y_smote=smote.fit_resample(X,Y)

Return Type:a matrix with the shape of n_samples*n_features

Example:




# import required modules
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
  
# define dataset
x, y = make_classification(n_samples=10000, weights=[0.99], flip_y=0)
smote = SMOTE()
x_smote, y_smote = smote.fit_resample(x, y)
  
# print the features and the labels
print('x_smote:\n', x_smote)
print('y_smote:\n', y_smote)

Output:

Undersampling

Syntax:

from imblearn.under_sampling import EditedNearestNeighbours

Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, n_neighbors=3, kind_sel=’all’, n_jobs=1, ratio=None

Implementation:
en = EditedNearestNeighbours()
X_en,Y_en=en.fit_resample(X, y)

Return Type:a matrix with the shape of n_samples*n_features

Example:




# import required modules
from sklearn.datasets import make_classification
from imblearn.under_sampling import EditedNearestNeighbours
  
# define dataset
x, y = make_classification(n_samples=10000, weights=[0.99], flip_y=0)
en = EditedNearestNeighbours()
x_en, y_en = en.fit_resample(x, y)
  
# print the features and the labels
print('x_en:\n', x_en)
print('y_en:\n', y_en)

Output:

Syntax:

from imblearn.under_sampling import RandomUnderSampler
Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, replacement=False, ratio=None

Implementation:
undersample = RandomUnderSampler()
X_under, y_under = undersample.fit_resample(X, y)

Return Type: a matrix with the shape of n_samples*n_features

Example:




# import required modules
from sklearn.datasets import make_classification
from imblearn.under_sampling import RandomUnderSampler
  
# define dataset
x, y = make_classification(n_samples=10000
                           weights=[0.99], 
                           flip_y=0)
undersample = RandomUnderSampler()
x_under, y_under = undersample.fit_resample(x, y)
  
# print the features and the labels
print('x_under:\n', x_under)
print('y_under:\n', y_under)

Output:


Article Tags :