Python: An Intro to caching

A cache is a way to store a limited amount of data such that future requests for said data can be retrieved faster. In this article, we’ll look at a simple example that uses a dictionary for our cache. Then we’ll move on to using the Python standard library’s functools module to create a cache. Let’s start by creating a class that will construct our cache dictionary and then we’ll extend it as necessary. Here’s the code:

########################################################################
class MyCache:
    """"""
 
    #----------------------------------------------------------------------
    def __init__(self):
        """Constructor"""
        self.cache = {}
        self.max_cache_size = 10

There’s nothing in particular that’s special in this class example. We just create a simple class and set up two class variables or properties, cache and max_cache_size. The cache is just an empty dictionary while the other is self-explanatory. Let’s flesh this code out and make it actually do something:

import datetime
import random
 
########################################################################
class MyCache:
    """"""
 
    #----------------------------------------------------------------------
    def __init__(self):
        """Constructor"""
        self.cache = {}
        self.max_cache_size = 10
 
    #----------------------------------------------------------------------
    def __contains__(self, key):
        """
        Returns True or False depending on whether or not the key is in the 
        cache
        """
        return key in self.cache
 
    #----------------------------------------------------------------------
    def update(self, key, value):
        """
        Update the cache dictionary and optionally remove the oldest item
        """
        if key not in self.cache and len(self.cache) >= self.max_cache_size:
            self.remove_oldest()
 
        self.cache[key] = {'date_accessed': datetime.datetime.now(),
                           'value': value}
 
    #----------------------------------------------------------------------
    def remove_oldest(self):
        """
        Remove the entry that has the oldest accessed date
        """
        oldest_entry = None
        for key in self.cache:
            if oldest_entry is None:
                oldest_entry = key
            elif self.cache[key]['date_accessed'] < self.cache[oldest_entry][
                'date_accessed']:
                oldest_entry = key
        self.cache.pop(oldest_entry)
 
    #----------------------------------------------------------------------
    @property
    def size(self):
        """
        Return the size of the cache
        """
        return len(self.cache)

Here we import the datetime and random modules and then we see the class we created earlier. This time around, we add a few methods. One of the methods is a magic method called __contains__. I’m abusing it a little, but the basic idea is that it will allow us to check the class instance to see if it contains the key we’re looking for. The update method will update our cache dictionary with the new key / value pair. It will also remove the oldest entry if the maximum cache value is reached or exceeded. The remove_oldest method actually does the removing of the oldest entry in the dictionary, which in this case means the item that has the oldest access date. Finally we have a property called size which returns the size of our cache.

If you add the following code, we can test that the cache works as expected:

if __name__ == '__main__':
    # Test the cache
    keys = ['test', 'red', 'fox', 'fence', 'junk',
            'other', 'alpha', 'bravo', 'cal', 'devo',
            'ele']
    s = 'abcdefghijklmnop'
    cache = MyCache()
    for i, key in enumerate(keys):
        if key in cache:
            continue
        else:
            value = ''.join([random.choice(s) for i in range(20)])
            cache.update(key, value)
        print("#%s iterations, #%s cached entries" % (i+1, cache.size))
    print

In this example, we set up a bunch of predefined keys and loop over them. We add keys to the cache if they don’t already exist. The piece missing is a way to update the date accessed, but I’ll leave that as an exercise for the reader. If you run this code, you’ll notice that when the cache fills up, it starts deleting the old entries appropriately.

Now let’s move on and take a look at another way of creating caches using Python’s built-in functools module!


Using functools.lru_cache

The functools module provides a handy decorator called lru_cache. Note that it was added in 3.2. According to the documentation, it will “wrap a function with a memoizing callable that saves up to the maxsize most recent calls”. Let’s write a quick function based on the example from the documentation that will grab various web pages. In this case, we’ll be grabbing pages from the Python documentation site.

import urllib.error
import urllib.request
 
from functools import lru_cache
 
 
@lru_cache(maxsize=24)
def get_webpage(module):
    """
    Gets the specified Python module web page
    """    
    webpage = "https://docs.python.org/3/library/{}.html".format(module)
    try:
        with urllib.request.urlopen(webpage) as request:
            return request.read()
    except urllib.error.HTTPError:
        return None
 
if __name__ == '__main__':
    modules = ['functools', 'collections', 'os', 'sys']
    for module in modules:
        page = get_webpage(module)
        if page:
            print("{} module page found".format(module))

In the code above, we decorate our get_webpage function with lru_cache and set its max size to 24 calls. Then we set up a webpage string variable and pass in which module we want our function to fetch. I found that this works best if you run it in a Python interpreter, such as IDLE. This alls you to run the loop a couple of times against the function. What you will quickly see is that the first time it runs the code, the output is printed our relatively slowly. But if you run it again in the same session, you will see that it prints immediately which demonstrates that the lru_cache has cached the calls correctly. Give this a try in your own interpreter instance to see the results for yourself.

There is also a typed parameter that we can pass to the decorator. It is a Boolean that tells the decorator to cache arguments of different types separately if typed is set to True.


Wrapping Up

Now you know a little about writing your own caches with Python. It’s a fun tool and very useful if you’re making a lot of expensive I/O calls or if you want to cache something like login credentials. Have fun!

Print Friendly
  • Gennady

    It’s conventional to check if the value is None with “is” operator.

  • True enough. Fixed. Thanks!