This article aims at giving an introduction to magic numbers and file headers, how to extract a file based on magic numbers, and how to corrupt and repair a file based on magic numbers in Linux environment.
Magic numbers are the first few bytes of a file which are unique to a particular file type. These unique bits are referred to as magic numbers, also sometimes referred to as a file signature.
These bytes can be used by the system to “differentiate between and recognize different files” without a file extension.
Locating Magic Numbers in File Signatures
Most files have the signatures in the bytes at the beginning of the file, but some file systems may even have the file signature at offsets other than the beginning. For example, file system ext2/ext3 have bytes 0x53 and 0xEF at the 1080th and 1081st position.
- Some files however, do not have magic numbers, such as plain text files, but can be identified by checking the character set (ASCII in the case of text files).
This can be done by using the command:
file -i *name_of_file*
- Magic numbers/File signatures are typically not visible to the user, but can be seen by using a hex editor or by using the ‘xxd’ command as mentioned below. These bytes are essential for a file to be opened.
- Changing/corrupting these bytes will render the file useless as most tools will not access these files due to potential damaging.
- The file command in linux reader reads the magic numbers of a file and displays the file type based on the magic number.
- For example, Let us take the example of a PNG file. We can view the hex of a file by typing the following command in a linux terminal (kali linux used in this article). This command creates a hexdump of the file we pass to it.
xxd image.png | head
This produces the following output:-
In this image, we see that the first set of bytes of the file are
89 50 4e 47 0d 0a 1a 0a // magic number of PNG file
These numbers help the system identify the type of file being used. Some files that are not written with their extension, are identified with the help of these magic numbers.
An example of a Zip file, Similarly, use the above mentioned command on a zip file.
xxd test.zip | head
In the above image, we can see that the file starts with:
50 3b 03 04 // magic number of zip file
Appending One File to another and identifying the division with Magic numbers
We can use python to perform this operation. Essentially, we will read the bytes of two files, and write them one by one to another empty file. In this article, we will combine a PNG with a Zip file.
""" The first two lines open the two files to be read byte by byte The third line opens an output file to be written to byte by byte """ input_file_1 = open("image.png", 'rb').read() input_file_2 = open("test.zip", 'rb').read() output_file = open("output.png" , 'wb') output_file.write(input_file_1) output_file.write(input_file_2)
Using this python code, we obtain a file output.png. On running the command:
xxd output.png | head
on this file, we notice that it begins with the same 8950 4e47 0d0a 1a0a hex. However, if we run the command
xxd output.png | grep "PK"
which will search for the the magic numbers (PK is the ascii equivalent of 50 3b) of a zip file amongst the hex,
we will get the following output:
In this picture, we can see that the zip file magic numbers are present in the hex of the png, meaning that we have successfully appended the hex of the zip file to that of the png. The next step is to separate this zip file from the png.
There is a simple utility, ‘binwalk’ that helps us perform this task easily by typing:
binwalk -e output.png -e stands for extract
How to use the magic numbers and offsets to extract the zip file from the output :
- Find the beginning offset of the file you wish to extract: In this example, we wish to extract the zip file from the PNG. So we first look for the ZIP header. As shown in the previous picture, we carry out the command ‘xxd output.png | grep “PK” ‘. In this picture, we see the offset on the left column. For the zip file, the offset will be 00001c90.
- Calculate the number of bits from the offset where your file header starts: We must now calculate the number of bits from the offset at which the zip file starts. We can manually count this and observe it to be 00001c95 (each hex value corresponds to 1 bit)
- Convert this hex value to decimal: This may be done by opening a python IDLE (type ‘python’ in a linux terminal. We must now convert this value to a decimal. In a python IDLE, we must simply add 0x to the beginning of the value found in the previous step.
- Use the following command:
'dd if=*input file* bs=1 skip=*value calculated in step 3* of=*output file name*'
In the above mentioned command, ‘if’ stands for input file, ‘skip’ denotes the number of bits that must be skipped to reach the beginning of the file that we wish to extract, ‘bs’ refers to the number of bytes that must be read at a time, and ‘of’ refers to the output file name.
Refer to the picture below to see the usage of steps 3 and 4:
- File Extraction: We can now open up the present working directory from the terminal to view our extracted zip file by typing:
How to corrupt a file by changing its Magic Number?
Changing the magic numbers of a file renders the file useless. We will be showed an error whenever we try to open a file that has a distorted header.
- Download hex editor: To corrupt a file, we require a hex editor. hexedit is a popular tool used for the same. You can install it using:
sudo apt-get install hexedit
You can open the file by typing
You will see an output like this:
- Change the file: To change a byte using hexedit, you simply have to move the cursor over a byte, and type what you would like to. For the sake of this article, I will change the magic numbers from 89 50 to 00 00.
To save and exit, press ctrl X and then Y.
In the above picture, we see that the first 2 bytes have been changed to 00 00, and on the right, we can see that the text has changed from .PNG to ..NG
How to repair a file that has a corrupted magic number?
Let us use the example shown in the picture above where I have corrupted the first two bytes of the PNG. If you try to open the PNG now, it will give you an error saying “Could not load file”, not a “PNG”. This is proof that a system looks at the magic number before opening a file. Knowing that the PNG magic numbers start with 89 50, we can change the bytes back to their original value.
Let us look at another example, using a jpeg image.
Lets first see what a working jpeg hex looks like:
the original magic number bytes are
FF D8 FF E0
A JPEG with corrupted magic bytes would look like this:
We notice that the magic bytes in this are
EE A8 CC 00
And hence the jpg file will not open if you try to open it. We get this error:
A JPG file typically has magic number “FFD8 DDE0”, “FFD8 FFDB” or “FFD8 FFE1”.
With this knowledge, all we would have to do is try these combinations as headers for the file. Doing this requires the same process as file corruption.
- Open hexedit
- Change the first few bytes by hovering with cursor and entering the required values
- Save (Ctrl X) and exit
- Try opening the file. Repeat steps with next possible magic number if the file does not open
On changing the magic bytes to FFD8 FFE0, the picture opens properly.
This article is contributed by Deepak Srivatsav. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.