Scaling numbers in machine learning is a common pre-processing technique to standardize the independent features present in the data in a fixed range. When applied to a Python sequence, such as a Pandas Series, scaling results in a new sequence such that your entire values in a column comes under a range. For example if the range is ( 0 ,1 ) your entire data within that column will be in the range 0,1 only.
if the sequence is [1, 2, 3] then the scaled sequence is [0, 0.5, 1]
- In machine learning, scaling can improve the convergence speed of various algorithms.
- Often in machine learning, you will come across data sets with a huge variation, and it will be difficult for many machine learning models well on that data so in that case scaling helps to keep the data within a range.
Note: We will be using Scikit-learn in this article to scale the pandas dataframe.
- Import pandas and sklearn library in python.
- Call the DataFrame constructor to return a new DataFrame.
- Create an instance of sklearn.preprocessing.MinMaxScaler.
- Call sklearn.preprocessing.MinMaxScaler.fit_transform(df[[column_name]]) to return the Pandas DataFrame df from the first step with the specified column min-max scaled.
Example 1 :
A very basic example of how MinMax
Example 2 : You can also scale more than one pandas, DataFrame’s column at a time, you just have to pass the column names in the MinMaxScaler.fit_transform() function.
Example 3: By default, the scale value used the class MinMaxScaler() is (0,1) but you can change it to any value you want as per your need.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course