Python – Scaling numbers column by column with Pandas
Scaling numbers in machine learning is a common pre-processing technique to standardize the independent features present in the data in a fixed range. When applied to a Python sequence, such as a Pandas Series, scaling results in a new sequence such that your entire values in a column comes under a range. For example if the range is ( 0 ,1 ) your entire data within that column will be in the range 0,1 only.
if the sequence is [1, 2, 3] then the scaled sequence is [0, 0.5, 1]
- In machine learning, scaling can improve the convergence speed of various algorithms.
- Often in machine learning, you will come across data sets with a huge variation, and it will be difficult for many machine learning models well on that data so in that case scaling helps to keep the data within a range.
Note: We will be using Scikit-learn in this article to scale the pandas dataframe.
- Import pandas and sklearn library in python.
- Call the DataFrame constructor to return a new DataFrame.
- Create an instance of sklearn.preprocessing.MinMaxScaler.
- Call sklearn.preprocessing.MinMaxScaler.fit_transform(df[[column_name]]) to return the Pandas DataFrame df from the first step with the specified column min-max scaled.
Example 1 :
A very basic example of how MinMax
Example 2 : You can also scale more than one pandas, DataFrame’s column at a time, you just have to pass the column names in the MinMaxScaler.fit_transform() function.
Example 3: By default, the scale value used the class MinMaxScaler() is (0,1) but you can change it to any value you want as per your need.