How to Generate MD5 Checksum for Files in Java?
An alphanumeric value i.e. the sequence of letters and numbers that uniquely defines the contents of a file is called a checksum (often referred to as a hash). Checksums are generally used to check the integrity of files downloaded from an external source. You may use a checksum utility to ensure that your copy is equivalent if you know the checksum of the original version. For example, before backing up your files you can generate a checksum of those files and can verify the same once you have to download them on some other device. The checksum would be different if the file has been corrupted or altered in the process.
MD5 and SHA are the two most widely used checksum algorithms. You must ensure that you use the same algorithm that has been used to generate the checksum when checking checksums. For example, the MD5 checksum value of a file is totally different from its SHA-256 checksum value.
To produce a checksum, you run a program that puts that file through an algorithm. Typical algorithms used for this include MD5, SHA-1, SHA-256, and SHA-512.
These algorithms use a cryptographic hash function that takes an input and generates a fixed-length alphanumeric string regardless of the size of the file.
- Even small changes in the file will produce a different checksum.
- These cryptographic hash functions, though, aren’t flawless. “Collisions” with the MD5 and SHA-1 functions have been discovered by security researchers. They’ve found two different files, that produce the same MD5 or SHA-1 hash, but are different. This is highly unlikely to happen by mere accident, but this strategy may be used by an attacker to mask a malicious file as a valid file.
Generating Checksum in Java
Java provides an inbuilt functionality of generating these hash functions through MessageDigest Class present in the security package of Java. Message digests are encrypted one-way hash functions that take data of arbitrary size and produce a hash value of fixed length.
- We first start with instantiating the MessageDigest Object by passing any valid hashing algorithm string.
- Then we update this object till we read the complete file. Although we can use the digest(byte input) which creates a final update on the MessageDigest object by reading the whole file at once in case the file is too big/large we might not have enough memory to read the entire file as a byte array and this could result in Java.lang.OutOfMemoryError: Java Heap Space.
- So, It’s better to read data in parts and update MessageDigest.
Once the update is complete one of the digest method is called to complete the hash computation. Whenever a digest method is called the MessageDigest object is reset to its initialized state. The digest method returns a byte array that has bytes in the decimal format so we Convert it to hexadecimal format. And the final string is the checksum.