Python | Create Archives and Find Files by Name

In this article, we will learn how to create or unpack archives in common formats (e.g., .tar, .tgz, or .zip) using shutil module.

The shutil module has two functions — make_archive() and unpack_archive() — that can exactly be the solution.

Code #1 :



filter_none

edit
close

play_arrow

link
brightness_4
code

import shutil
shutil.unpack_archive('Python-3.3.0.tgz')
shutil.make_archive('py33', 'zip', 'Python-3.3.0')

chevron_right


Output :

'/Users/Dell/Downloads/py33.zip'

The second argument to make_archive() is the desired output format. To get a list of supported archive formats, use get_archive_formats().
 
Code #2 :

filter_none

edit
close

play_arrow

link
brightness_4
code

shutil.get_archive_formats()

chevron_right


Output :

[('bztar', "bzip2'ed tar-file"), 
 ('gztar', "gzip'ed tar-file"), 
 ('tar', 'uncompressed tar file'), 
 ('zip', 'ZIP file')]

Python has other library modules for dealing with the low-level details of various archive formats (e.g., tarfile, zipfile, gzip, bz2, etc.). However, to make or extract an archive, there’s really no need to go so low level.
One can just use these high-level functions in shutil instead. The functions have a variety of additional options for logging, dry-runs, file permissions, and so forth.
 
Let’s write a script that involves finding files, like a file renaming script or a log archiver utility, but rather not have to call shell utilities from within the Python script, or to provide specialized behavior not easily available by “shelling out.”

To search for files, use the os.walk() function, supplying it with the top-level directory.

Code #3 : Function that finds a specific filename and prints out the full path of all matches.

filter_none

edit
close

play_arrow

link
brightness_4
code

import os
  
def findfile(start, name):
    for relpath, dirs, files in os.walk(start):
  
        if name in files:
            full_path = os.path.join(start, relpath, name)
            print(os.path.normpath(os.path.abspath(full_path)))
  
if __name__ == '__main__':
    findfile(sys.argv[1], sys.argv[2])

chevron_right


Save this script as abc.py and run it from the command line, feeding in the starting point and the name as positional arguments as –

filter_none

edit
close

play_arrow

link
brightness_4
code

bash % ./abc.py .myfile.txt

chevron_right


 
How it works ?

  • The os.walk() method traverses the directory hierarchy for us, and for each directory it enters, it returns a 3-tuple, containing the relative path to the directory it’s inspecting, a list containing all of the directory names in that directory, and a list of filenames in that directory.
  • For each tuple, simply check if the target filename is in the files list. If it is, os.path.join() is used to put together a path.
  • To avoid the possibility of weird looking paths like ././foo//bar, two additional functions are used to fix the result.
  • The first is os.path.abspath(), which takes a path that might be relative and forms the absolute path.
  • The second is os.path.normpath(), which will normalize the path, thereby resolving issues with double slashes, multiple references to the current directory, and so on.

Although the code is pretty simple compared to the features of the find utility found on UNIX platforms, it has the benefit of being cross-platform. Furthermore, a lot of additional functionality can be added in a portable manner without much more work.

Code #4 : Function that prints out all of the files that have a recent modification time

filter_none

edit
close

play_arrow

link
brightness_4
code

import os
import time
  
def modified_within(top, seconds):
    now = time.time()
  
    for path, dirs, files in os.walk(top):
        for name in files:
            fullpath = os.path.join(path, name)
  
            if os.path.exists(fullpath):
                mtime = os.path.getmtime(fullpath)
                if mtime > (now - seconds):
                    print(fullpath)
                      
if __name__ == '__main__':
    import sys
  
    if len(sys.argv) != 3:
        print('Usage: {} dir seconds'.format(sys.argv[0]))
        raise SystemExit(1)
          
    modified_within(sys.argv[1], float(sys.argv[2]))

chevron_right


It wouldn’t take long for you to build far more complex operations on top of this little function using various features of the os, os.path, glob, and similar modules.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.