Python: Distributing Data Support Files in a zip

The other day, I heard that when we transfer some of our scripts to our batch farm, we cannot transfer sub-folders. So if I have a directory structure like the following, it needs to be flattened:


Top
--> data
--> font
--> images

Instead of having sub-folders for my fonts, images, etc, I would need to put all my support files into the main directory. I thought this was pretty lame as I like to keep my directories organized. If you follow the Model-View-Controller (MVC) model with your code, then you’ll find this pretty annoying too. So I thought about this issue for a bit and realized that Python can access files from a zip archive either by using the zipfile library or via some magic from zipimport. Now I could use zipfile, but I wanted the ability to import my Python files if I had them in a sub-directory or if I just wanted them in the top section of the zip archive, so I decided to go the magical route.

Now you’re not supposed to use zipimport directly. It’s actually part of Python’s import mechanism and enabled by default. So we won’t be using it in our code, however I think you should know it’s doing stuff behind the scenes. Anyway, just for additional background, I create lots of custom PDFs using various logos and fonts. Thus, I usually keep those files in separate directories to keep things organized. I also use a lot of configuration files as some clients want things done one way and some another. So I’m going to show you a really simple snippet of code that you can use to import Python files and extract a config file. For the latter, we’ll be using Python’s handy pkgutil library.

Here’s the code:

import os
import pkgutil
import StringIO
import sys
from configobj import ConfigObj

base = os.path.dirname(os.path.abspath( __file__ ))
zpath = os.path.join(base, "test.zip")
sys.path.append(zpath)

import hello

#----------------------------------------------------------------------
def getCfgFromZip():
    """
    Extract the config file from the zip file
    """
    cfg_data = pkgutil.get_data("config", "config.ini")
    print cfg_data
    fileLikeObj = StringIO.StringIO(cfg_data)
    cfg = ConfigObj(fileLikeObj)
    cfg_dict = cfg.dict()
    print cfg_dict
    
if __name__ == "__main__":
    getCfgFromZip()

Now we also have a zip archive called test.zip alongside this script which contains the following:

  • hello.py
  • A folder named config which contains two files: config.ini and __init__.py

As you can see, Python knows it can import the hello module because we added the archive to the import path via sys.path.append. All the hello modules does is print a message to stdout. To extract a the config file, we use pkgutil.get_data(folder_name, file_name). This returns a string. Since we want to load the config into ConfigObj, we need to make that string into a file-like object, so we use StringIO for that purpose. You can do the same thing with images. The pkgutil will return a string of binary bytes that you can then pass to reportlab or Python Imaging Library for further processing. I added the print statements so you could see what the original data looked like and what ConfigObj outputs.

That’s all there is to it. I thought this was pretty handy and I hope you’ll find it useful in your own work.

Further Reading

Download the Source

1 thought on “Python: Distributing Data Support Files in a zip”

  1. Pingback: Mike Driscoll: Python: Distributing Data Support Files in a zip | The Black Velvet Room

Comments are closed.