Python – Scaling numbers column by column with Pandas
Scaling numbers in machine learning is a common pre-processing technique to standardize the independent features present in the data in a fixed range. When applied to a Python sequence, such as a Pandas Series, scaling results in a new sequence such that your entire values in a column comes under a range. For example if the range is ( 0 ,1 ) your entire data within that column will be in the range 0,1 only.
Example:
if the sequence is [1, 2, 3]
then the scaled sequence is [0, 0.5, 1]
Application:
- In machine learning, scaling can improve the convergence speed of various algorithms.
- Often in machine learning, you will come across data sets with a huge variation, and it will be difficult for many machine learning models well on that data so in that case scaling helps to keep the data within a range.
Note: We will be using Scikit-learn in this article to scale the pandas dataframe.
Steps:
- Import pandas and sklearn library in python.
- Call the DataFrame constructor to return a new DataFrame.
- Create an instance of sklearn.preprocessing.MinMaxScaler.
- Call sklearn.preprocessing.MinMaxScaler.fit_transform(df[[column_name]]) to return the Pandas DataFrame df from the first step with the specified column min-max scaled.
Example 1 :
A very basic example of how MinMax
Python3
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
pd_data = pd.DataFrame({
"Item" : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ],
"Price" : [ 100 , 300 , 250 , 120 , 910 , 345 , 124 , 1000 , 289 , 500 ]
})
scaler = MinMaxScaler()
pd_data[[ "ScaledPrice" ]] = scaler.fit_transform(pd_data[[ "Price" ]])
print (pd_data)
|
Output :
Example 2 : You can also scale more than one pandas, DataFrame’s column at a time, you just have to pass the column names in the MinMaxScaler.fit_transform() function.
Python3
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
pd_data = pd.DataFrame({
"Item" : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ],
"Price" : [ 100 , 300 , 250 , 120 , 910 , 345 , 124 , 1000 , 289 , 500 ],
"Weight" : [ 200 , 203 , 350 , 100 , 560 , 456 , 700 , 250 , 800 , 389 ]
})
scaler = MinMaxScaler()
pd_data[[ "ScaledPrice" , "ScaledWeight" ]] = scaler.fit_transform(
pd_data[[ "Price" , "Weight" ]])
print (pd_data)
|
Output :
Example 3: By default, the scale value used the class MinMaxScaler() is (0,1) but you can change it to any value you want as per your need.
Python3
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
pd_data = pd.DataFrame({
"Item" : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ],
"Price" : [ 100 , 300 , 250 , 120 , 910 , 345 , 124 , 1000 , 289 , 500 ]
})
scaler = MinMaxScaler(feature_range = ( 20 , 500 ))
pd_data[[ "ScaledPrice" ]] = scaler.fit_transform(pd_data[[ "Price" ]])
print (pd_data)
|
Output :
Last Updated :
25 Feb, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...