Tag Archives: concurrency

Python 3 Concurrency – The concurrent.futures Module

The concurrent.futures module was added in Python 3.2. According to the Python documentation it provides the developer with a high-level interface for asynchronously executing callables. Basically concurrent.futures is an abstraction layer on top of Python’s threading and multiprocessing modules that simplifies using them. However it should be noted that while the abstraction layer simplifies the usage of these modules, it also removes a lot of their flexibility, so if you need to do something custom, then this might not be the best module for you.

Concurrent.futures includes an abstract class called Executor. It cannot be used directly though, so you will need to use one of its two subclasses: ThreadPoolExecutor or ProcessPoolExecutor. As you’ve probably guessed, these two subclasses are mapped to Python’s threading and multiprocessing APIs respectively. Both of these subclasses will provide a pool that you can put threads or processes into.

The term future has a special meaning in computer science. It refers to a construct that can be used for synchronization when using concurrent programming techniques. The future is actually a way to describe the result of a process or thread before it has finished processing. I like to think of them as a pending result.

Continue reading Python 3 Concurrency – The concurrent.futures Module

Python 201: A multiprocessing tutorial

The multiprocessing module was added to Python in version 2.6. It was originally defined in PEP 371 by Jesse Noller and Richard Oudkerk. The multiprocessing module allows you to spawn processes in much that same manner than you can spawn threads with the threading module. The idea here is that because you are now spawning processes, you can avoid the Global Interpreter Lock (GIL) and take full advantages of multiple processors on a machine.

The multiprocessing package also includes some APIs that are not in the threading module at all. For example, there is a neat Pool class that you can use to parallelize executing a function across multiple inputs. We will be looking at Pool in a later section. We will start with the multiprocessing module’s Process class.


Getting started with multiprocessing

The Process class is very similar to the threading module’s Thread class. Let’s try creating a series of processes that call the same function and see how that works:

import os
 
from multiprocessing import Process
 
def doubler(number):
    """
    A doubling function that can be used by a process
    """
    result = number * 2
    proc = os.getpid()
    print('{0} doubled to {1} by process id: {2}'.format(
        number, result, proc))
 
if __name__ == '__main__':
    numbers = [5, 10, 15, 20, 25]
    procs = []
 
    for index, number in enumerate(numbers):
        proc = Process(target=doubler, args=(number,))
        procs.append(proc)
        proc.start()
 
    for proc in procs:
        proc.join()

For this example, we import Process and create a doubler function. Inside the function, we double the number that was passed in. We also use Python’s os module to get the current process’s ID (or pid). This will tell us which process is calling the function. Then in the block of code at the bottom, we create a series of Processes and start them. The very last loop just calls the join() method on each process, which tells Python to wait for the process to terminate. If you need to stop a process, you can call its terminate() method.

Continue reading Python 201: A multiprocessing tutorial

Python 201: A Tutorial on Threads

The threading module was first introduced in Python 1.5.2 as an enhancement of the low-level thread module. The threading module makes working with threads much easier and allows the program to run multiple operations at once.

Note that the threads in Python work best with I/O operations, such as downloading resources from the Internet or reading files and directories on your computer. If you need to do something that will be CPU intensive, then you will want to look at Python’s multiprocessing module instead. The reason for this is that Python has the Global Interpreter Lock (GIL) that basically makes all threads run inside of one master thread. Because of this, when you go to run multiple CPU intensive operations with threads, you may find that it actually runs slower. So we will be focusing on what threads do best: I/O operations!


Intro to Threads

A thread let’s you run a piece of long running code as if it were a separate program. It’s kind of like calling subprocess except that you are calling a function or class instead of a separate program. I always find it helpful to look at a concrete example. Let’s take a look at something that’s really simple:

import threading
 
 
def doubler(number):
    """
    A function that can be used by a thread
    """
    print(threading.currentThread().getName() + '\n')
    print(number * 2)
    print()
 
 
if __name__ == '__main__':
    for i in range(5):
        my_thread = threading.Thread(target=doubler, args=(i,))
        my_thread.start()

Continue reading Python 201: A Tutorial on Threads

Python 3 – An Intro to asyncio

The asyncio module was added to Python in version 3.4 as a provisional package. What that means is that it is possible that asyncio receives backwards incompatible changes or could even be removed in a future release of Python. According to the documentation asyncio “provides infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives“. This chapter is not meant to cover everything you can do with asyncio, however you will learn how to use the module and why it is useful.

If you need something like asyncio in an older version of Python, then you might want to take a look at Twisted or gevent.


Definitions

The asyncio module provides a framework that revolves around the event loop. An event loop basically waits for something to happen and then acts on the event. It is responsible for handling such things as I/O and system events. Asyncio actually has several loop implementations available to it. The module will default to the one most likely to be the most efficient for the operating system it is running under; however you can explicitly choose the event loop if you so desire. An event loop basically says “when event A happens, react with function B”.

Think of a server as it waits for someone to come along and ask for a resource, such as a web page. If the website isn’t very popular, the server will be idle for a long time. But when it does get a hit, then the server needs to react. This reaction is known as event handling. When a user loads the web page, the server will check for and call one or more event handlers. Once those event handlers are done, they need to give control back to the event loop. To do this in Python, asyncio uses coroutines.

A coroutine is a special function that can give up control to its caller without losing its state. A coroutine is a consumer and an extension of a generator. One of their big benefits over threads is that they don’t use very much memory to execute. Note that when you call a coroutine function, it doesn’t actually execute. Instead it will return a coroutine object that you can pass to the event loop to have it executed either immediately or later on.

One other term you will likely run across when you are using the asyncio module is future. A future is basically an object that represents the result of work that hasn’t completed. Your event loop can watch future objects and wait for them to finish. When a future finishes, it is set to done. Asyncio also supports locks and semaphores.

The last piece of information I want to mention is the Task. A Task is a wrapper for a coroutine and a subclass of Future. You can even schedule a Task using the event loop.

Continue reading Python 3 – An Intro to asyncio

Python Concurrency: An Intro to Threads

Python has a number of different concurrency constructs such as threading, queues and multiprocessing. The threading module used to be the primary way of accomplishing concurrency. A few years ago, the multiprocessing module was added to the Python suite of standard libraries. This article will be focused on the threading module though. Continue reading Python Concurrency: An Intro to Threads

PyCon 2010: Saturday Session 2 (early afternoon)

I managed to make it to three talks in the middle session. Here’s the list: “508 and You: Taking the Pain out of Accessibility” with Katie Cunningham, “Actors: What, Why, and How” with Donovan Preston and “Python Metaprogramming” with Nicolas Lara. I’ll see you after the jump! Continue reading PyCon 2010: Saturday Session 2 (early afternoon)