Open In App

Determining file format using Python

The general way of recognizing the type of file is by looking at its extension. But this isn’t generally the case. This type of standard for recognizing file by associating an extension with a file type is enforced by some operating system families (predominantly Windows). Other OS’s such as Linux (and its variants) use the magic number for recognizing file types. A Magic Number is a constant value, used for the identification of a file. This method provides more flexibility in naming a file and does not mandate the presence of an extension.  Magic numbers are good for recognizing files, as sometimes a file may not have the correct file extension (or may not have one at all).

In this article we will learn how to recognize files by their extension, using python. We would be using the Python Magic library to provide such capabilities to our program. To install the library, execute the following command in your operating system’s command interpreter:-



pip install python-magic

For demonstration purpose, we would be using a file name apple.jpg with the following contents:-



Apparent from the contents, the file is an HTML file. But since it is saved with a .jpg extension, the operating system won’t be able to recognize its actual file type. So this file would be befitting for our program. 




import magic
  
# printing the human readable type of the file
print(magic.from_file('apple.jpg'))
  
# printing the mime type of the file
print(magic.from_file('apple.jpg', mime = True))

Output:

HTML document, ASCII text, with CRLF line terminators
text/html

Explanation:

Firstly we import the magic library.  Then we use magic.from_file() method to attain the human-readable file type. After which we use the mime=True attribute to attain the mime type of the file. 

Things to consider while using the above code:

Article Tags :