Python 101: How to Write a Cleanup Script

The other day, someone asked me if I could write a script that could cleanup a directory of all files that are greater than or equal to X number of days old. I ended up using Python’s core modules for this task. We will spend some time looking at one way to do this useful exercise.

FAIR WARNING: The code in this article is designed to delete files. Use at your own risk!

Here’s the code I came up with:

import os
import sys
import time
 
#----------------------------------------------------------------------
def remove(path):
    """
    Remove the file or directory
    """
    if os.path.isdir(path):
        try:
            os.rmdir(path)
        except OSError:
            print "Unable to remove folder: %s" % path
    else:
        try:
            if os.path.exists(path):
                os.remove(path)
        except OSError:
            print "Unable to remove file: %s" % path
 
#----------------------------------------------------------------------
def cleanup(number_of_days, path):
    """
    Removes files from the passed in path that are older than or equal 
    to the number_of_days
    """
    time_in_secs = time.time() - (number_of_days * 24 * 60 * 60)
    for root, dirs, files in os.walk(path, topdown=False):
        for file_ in files:
            full_path = os.path.join(root, file_)
            stat = os.stat(full_path)
 
            if stat.st_mtime <= time_in_secs:
                remove(full_path)
 
        if not os.listdir(root):
            remove(root)
 
#----------------------------------------------------------------------
if __name__ == "__main__":
    days, path = int(sys.argv[1]), sys.argv[2]
    cleanup(days, path)

Let’s spend a few minutes looking at how this code works. In the cleanup function, we take the numberOfDays parameter and transform it into seconds. Then we subtract that amount from today’s current time. Next we use the os module’s walk method to walk through the directories. We set topdown to False to tell the walk method to traverse the directories from the innermost to the outermost. Then we loop over the files in the innermost folder and check its last access time. If that time is less than or equal to timeInSecs (i.e. X days ago), then we try to remove the file. When that loop finishes, we do a check on root to see if it has files (where root is the innermost folder). If it doesn’t, then we delete the folder.

The remove function is extremely straight forward. All it does is check if the path that is passed is a directory or not. Then it attempts to delete the path using the appropriately method (i.e. os.rmdir or os.remove).

Other Ways to Delete Folders/Files

There are a couple of other ways to modify folders and files which should be mentioned. If you know you have a set of nested directories are all empty, you could use os.removedirs() to just remove the them all in one fell swoop. Another more extreme way of doing this would be to use Python’s shutil module. It has a method called rmtree that can remove files and folders!

I’ve used both methods to great effect in other scripts. I have also found that sometimes I cannot delete a particular file on Windows unless I do it through Windows Explorer. To get around this, I have used Python’s subprocess module to call Window’s del command with its /F flag to force the delete. You can probably do something similar on Linux with its rm -r command. Occasionally you will run into files that are locked, protected or you just don’t have the correct permissions and you can’t delete them.

Features to Add

If you’ve spent any time thinking about the script above, you’ve probably already thought of some improvements or features to add. Here are some that I thought would be nice:

  • Add logging so you know what got deleted or what didn’t (or both)
  • Add some of those other deletion methods mentioned in the previous section
  • Make the cleanup script able to accept a range of dates or a list of dates to delete

I’m sure you have thought of other fun ideas or solutions. Feel free to share them in the comments below.

Further Reading

Print Friendly
  • Pingback: Mike Driscoll: Python 101: How to Write a Cleanup Script | The Black Velvet Room()

  • eryksun

    Consider using os.lstat instead. It doesn’t follow symbolic links. That way you’re checking the timestamp on the actual link that would be deleted.

    In 2.x the os module raises OSError, not IOError. Check the exception’s errno attribute against the constants in the errno module. For example, e.errno == errno.ENOTEMPTY. This was cleaned up a lot in 3.3, per PEP 3151. The new design aliases IOError to OSError and has subclasses for common errors such as FileExistsError (EEXIST), FileNotFoundError (ENOENT), IsADirectoryError (EISDIR), NotADirectoryError (ENOTDIR), and PermissionError (EACCES, EPERM).

    The /F option of del, in Windows cmd, removes the readonly attribute before deleting the file. In Python you can use os.chmod(filepath, stat.S_IWRITE); os.remove(filepath) — assuming the file’s ACL allows this. It’s possible to modify the file security attributes via PyWin32’s win32security and ntsecuritycon, but it may be simpler to call subinacl or takeown/icacls via the subprocess module.

  • I’ve gotten IOErrors on CentOS (Linux) before. Yes, I know about changing the security attributes with PyWin32, but I think that’s a little overkill for something this simple. I will probably have to write a follow-up article illustrating some of the ideas I mentioned along with some of the ones in these comments.

  • It might, but I want to empty the directories from the deepest levels out so I can also remove the directory if it’s empty. I don’t think glob returns the list of paths in that order. As for the exception question, I was thinking I would pick that up with the logging module. It has logging.exception(msg) which will write out the entire traceback to the log.

  • Okay. I changed the example. Thanks for the feedback!

  • I think you need to convert the sys.argv[1] to an int. Otherwise your code won’t work correctly. E.g. if 3 is passed for sys.argv[1], Python gets it as a string, so doing sys.argv[1] * 2 will give, not the int 6, but the string “33”.

  • Pingback: Python 101: Writing a cleanup script | Hello Linux()

  • Pingback: Mike Driscoll: Python 101: How to traverse a directory — Dr. Jonathan Jenkins, DBA, CSSBB, MSQA()