The general way of recognizing the type of file is by looking at its extension. But this isn’t generally the case. This type of standard for recognizing file by associating an extension with a file type is enforced by some operating system families (predominantly Windows). Other OS’s such as Linux (and its variants) use the magic number for recognizing file types. A Magic Number is a constant value, used for the identification of a file. This method provides more flexibility in naming a file and does not mandate the presence of an extension. Magic numbers are good for recognizing files, as sometimes a file may not have the correct file extension (or may not have one at all).
In this article we will learn how to recognize files by their extension, using python. We would be using the Python Magic library to provide such capabilities to our program. To install the library, execute the following command in your operating system’s command interpreter:-
pip install python-magic
For demonstration purpose, we would be using a file name apple.jpg with the following contents:-
Apparent from the contents, the file is an HTML file. But since it is saved with a .jpg extension, the operating system won’t be able to recognize its actual file type. So this file would be befitting for our program.
HTML document, ASCII text, with CRLF line terminators text/html
Firstly we import the magic library. Then we use magic.from_file() method to attain the human-readable file type. After which we use the mime=True attribute to attain the mime type of the file.
Things to consider while using the above code:
- The code works on Linux and Mac OS. But there exists an inbuilt terminal command named file on those operating systems, which provide the same functionality as this program, without installing any other library.
- File type recognition using extensions also exists in the newer versions of the library.
- Since the file type recognition generally happens by fingerprint lookup of the header of the file, it is not mandatory for one to load the whole file for type recognition. Small sections of the files could also be provided as an argument using magic.from_buffer() and passing the initial bytes of the file using open(‘file.ext’, ‘rb’).read(n) (Only recommended if aware of the header format of the file type).
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- reStructuredText | .rst file to HTML file using Python for Documentations
- Create a GUI to convert CSV file into excel file using Python
- PYGLET – Opening file using File Location
- Python - Get file id of windows file
- Python program to reverse the content of a file and store it in another file
- Convert an image into jpg format using Pillow in Python
- Formatting containers using format() in Python
- Converting a 10 digit phone number to US format using Regex in Python
- MoviePy – Getting Original File Name of Video File Clip
- PyCairo - Saving SVG Image file to PNG file
- Python program to convert time from 12 hour to 24 hour format
- Python | Pandas TimedeltaIndex.format
- Python | Ways to format elements of given list
- Python program to print the dictionary in table format
- Vulnerability in str.format() in Python
- Python IMDbPY – Series Information in XML format
- Python IMDbPY – Company Information in XML format
- Python IMDbPY – Person Information in XML format
- Python IMDbPY – Movies Information in XML format
- format() function in Python
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.