1. To convert the floating point into decimal, we have 3 elements in a 32-bit floating point representation:
i) Sign
ii) Exponent
iii) Mantissa
- Sign bit is the first bit of the binary representation. ‘1’ implies negative number and ‘0’ implies positive number.
Example: 11000001110100000000000000000000 This is negative number. - Exponent is decided by the next 8 bits of binary representation. 127 is the unique number for 32 bit floating point representation. It is known as bias. It is determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
There are 3 exponent bits in 8-bit representation and 8 exponent bits in 32-bit representation.
Thus
bias = 3 for 8 bit conversion (23-1 -1 = 4-1 = 3)
bias = 127 for 32 bit conversion. (28-1 -1 = 128-1 = 127)
Example: 01000001110100000000000000000000
10000011 = (131)10
131-127 = 4
Hence the exponent of 2 will be 4 i.e. 24 = 16.
- Mantissa is calculated from the remaining 23 bits of the binary representation. It consists of ‘1’ and a fractional part which is determined by:
Example:
01000001110100000000000000000000
The fractional part of mantissa is given by:
1*(1/2) + 0*(1/4) + 1*(1/8) + 0*(1/16) +……… = 0.625
Thus the mantissa will be 1 + 0.625 = 1.625
The decimal number hence given as: Sign*Exponent*Mantissa = (-1)0*(16)*(1.625) = 26
2. To convert the decimal into floating point, we have 3 elements in a 32-bit floating point representation:
i) Sign (MSB)
ii) Exponent (8 bits after MSB)
iii) Mantissa (Remaining 23 bits)
- Sign bit is the first bit of the binary representation. ‘1’ implies negative number and ‘0’ implies positive number.
Example: To convert -17 into 32-bit floating point representation Sign bit = 1 - Exponent is decided by the nearest smaller or equal to 2n number. For 17, 16 is the nearest 2n. Hence the exponent of 2 will be 4 since 24 = 16. 127 is the unique number for 32 bit floating point representation. It is known as bias. It is determined by 2k-1 -1 where ‘k’ is the number of bits in exponent field.
Thus bias = 127 for 32 bit. (28-1 -1 = 128-1 = 127)
Now, 127 + 4 = 131 i.e. 10000011 in binary representation.
- Mantissa: 17 in binary = 10001.
Move the binary point so that there is only one bit from the left. Adjust the exponent of 2 so that the value does not change. This is normalizing the number. 1.0001 x 24. Now, consider the fractional part and represented as 23 bits by adding zeros.
00010000000000000000000
Advantages:
Wide range of values: Floating factor illustration lets in for a extensive variety of values to be represented, along with very massive and really small numbers.
Precision: Floating factor illustration offers excessive precision, that is important for medical and engineering calculations.
Compatibility: Floating point illustration is extensively used in computer structures, making it well matched with a extensive variety of software and hardware.
Easy to use: Most programming languages offer integrated guide for floating factor illustration, making it smooth to use and control in laptop programs.
Disadvantages:
Complexity: Floating factor illustration is complex and can be tough to understand, mainly for folks that aren’t acquainted with the underlying mathematics.
Rounding errors: Floating factor illustration can result in rounding mistakes, where the real price of a number of is barely extraordinary from its illustration inside the computer.
Speed: Floating factor operations can be slower than integer operations, particularly on older or much less powerful hardware.
Limited precision: Despite its excessive precision, floating factor representation has a restrained number of sizeable digits, which could restrict its usefulness in some programs.
Related Link:
https://www.youtube.com/watch?v=03fhijH6e2w
More questions on number representation:
https://www.geeksforgeeks.org/number-representation-gq/
This article is contributed by Kriti Kushwaha
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.