Problems on min-max normalization
The measurement unit used can affect the data analysis. For instance, changing the measurement unit from kg to pounds. Expressing an attribute in smaller units will lead to a larger range for that attribute and thus give inefficient results. To avoid the dependence on the choice of measurement units, the data should be normalized. Normalization is used for scaling the data of attributes so that it falls in a smaller range, such as -2.0 to 2.0. It is usually used in classification algorithms.
Use of normalization :
Normalization is required when we deal with attributes having a different scale, and it may lead to less effectiveness of an important attribute(having a lower scale) because of other attributes having values on a larger scale. It also decreases the training time needed for the machine to learn the data. Thus, the efficiency of the machine learning model increases after the dataset is normalized.
Min-Max Normalization :
In this technique of knowledge normalization, a linear transformation is performed on the first data. Minimum and maximum value from data is fetched and each value is replaced according to the following formula. Min-Max Normalization preserves the relationships among the original data values. It will encounter an out-of-bounds error if a future input case for normalization falls outside the first data range for A. The formula is given below-
Where A is the attribute data represent as follows.
Min(A) - It is the minimum absolute value A. Max(A) - It is maximum absolute value of A. v’ - It is the new value of each attribute data. v - It is the old value of each attribute data. new_max(A), new_min(A) is the max and min value within the range (i.e boundary value of range required) respectively. Min-Max Normalization maps a value v of A to v' in the range [new_min(A),new_max(A)] by computing.
Here, we will discuss an example as follows.
Normalize the following group of data –
1000,2000,3000,9000 using min-max normalization by setting min:0 and max:1
here,new_max(A)=1 , as given in question- max=1 new_min(A)=0,as given in question- min=0 max(A)=9000,as the maximum data among 1000,2000,3000,9000 is 9000 min(A)=1000,as the minimum data among 1000,2000,3000,9000 is 1000
Case-1: normalizing 1000 –
v = 1000 , putting all values in the formula,we get v' = (1000-1000) X (1-0) ----------------- + 0 =0 9000-1000
Case-2: normalizing 2000 –
v = 2000, putting all values in the formula,we get v '= (2000-1000) X (1-0) ----------------- + 0 =0 .125 9000-1000
Case-3: normalizing 3000 –
v=3000, putting all values in the formula,we get v'=(3000-1000) X (1-0) ----------------- + 0 =0 .25 9000-1000
Case-4: normalizing 9000 –
v=9000, putting all values in the formula, we get v'=(9000-1000) X (1-0) ----------------- + 0 =1 9000-1000
Hence, the normalized values of 1000,2000,3000,9000 are 0, 0.125, .25, 1.