Python 101 - Creating Multiple Threads

Concurrency is a big topic in programming. The concept of concurrency is to run multiple pieces of code at once. Python has a couple of different solutions that are built-in to its standard library. You can use threads or processes. In this chapter, you will learn about using threads.

When you run your own code, you are using a single thread. If you want to run something else in the background, you can use Python's threading module.

In this article you will learn the following:

  • Pros of Using Threads
  • Cons of Using Threads
  • Creating Threads
  • Subclassing Thread
  • Writing Multiple Files with Threads

Note: This chapter is not meant to be comprehensive in its coverage of threads. But you will learn enough to get started using threads in your application.

Let's get started by going over the pros and cons of using threads!

Pros of Using Threads

Threads are useful in the following ways:

  • They have a small memory footprint, which means they are lightweight to use
  • Memory is shared between threads - which makes it easy to share state across threads
  • Allows you to easily make responsive user interfaces
  • Great option for I/O bound applications (such as reading and writing files, databases, etc)

Now let's look at the cons!

Cons of Using Threads

Threads are not useful in the following ways:

  • Poor option for CPU bound code due to the Global Interpreter Lock (GIL) - see below
  • They are not interruptible / able to be killed
  • Code with threads is harder to understand and write correctly
  • Easy to create race conditions

The Global Interpreter Lock is a mutex that protects Python objects. This means that it prevents multiple threads from executing Python bytecode at the same time. So when you use threads, they do not run on all the CPUs on your machine.

Threads are great for running I/O heavy applications, image processing, and NumPy's number-crunching because they don't do anything with the GIL. If you have a need to run concurrent processes across multiple CPUs, use the multiprocessing module. You will learn about the multiprocessing module in the next chapter.

A race condition happens when you have a computer program that depends on a certain order of events to happen for it to execute correctly. If your threads execute something out of order, then the next thread may not work and your application can crash or behave in unexpected ways.

Creating Threads

Threads are confusing if all you do is talk about them. It's always good to familiarize yourself with how to write actual code. For this chapter, you will be using the threading module which uses the _thread module underneath.

The full documentation for the threading module can be found here:

Let's write a simple example that shows how to create multiple threads. Put the following code into a file named worker_threads.py:

# worker_threads.py

import random
import threading
import time


def worker(name: str) -> None:
    print(f'Started worker {name}')
    worker_time = random.choice(range(1, 5))
    time.sleep(worker_time)
    print(f'{name} worker finished in {worker_time} seconds')

if __name__ == '__main__':
    for i in range(5):
        thread = threading.Thread(
                target=worker,
                args=(f'computer_{i}',),
                )
        thread.start()

The first three imports give you access to the random, threading and time modules. You can use random to generate pseudo-random numbers or choose from a sequence at random. The threading module is what you use to create threads and the time module can be used for many things related to time.

In this code, you use time to wait a random amount of time to simulate your "worker" code working.

Next you create a worker() function that takes in the name of the worker. When this function is called, it will print out which worker has started working. It will then choose a random number between 1 and 5. You use this number to simulate the amount of time the worker works using time.sleep(). Finally you print out a message that tells you a worker has finished and how long the work took in seconds.

The last block of code creates 5 worker threads. To create a thread, you pass in your worker() function as the target function for the thread to call. The other argument you pass to thread is a tuple of arguments that thread will pass to the target function. Then you call thread.start() to start running that thread.

When the function stops executing, Python will delete your thread.

Try running the code and you'll see that the output will look similar to the following:

Started worker computer_0
Started worker computer_1
Started worker computer_2
Started worker computer_3
Started worker computer_4
computer_0 worker finished in 1 seconds
computer_3 worker finished in 1 seconds
computer_4 worker finished in 3 seconds
computer_2 worker finished in 3 seconds
computer_1 worker finished in 4 seconds

Your output will differ from the above because the workers sleep() for random amounts of time. In fact, if you run the code multiple times, each invocation of the script will probably have a different result.

threading.Thread is a class. Here is its full definition:

threading.Thread(
    group=None, target=None, name=None,
    args=(), kwargs={},
    *,
    daemon=None,
    )

You could have named the threads when you created the thread rather than inside of the worker() function. The args and kwargs are for the target function. You can also tell Python to make the thread into a daemon. "Daemon threads" have no claim on the Python interpreter, which has two main consequences: 1) if only daemon threads are left, Python will shut down, and 2) when Python shuts down, daemon threads are abruptly stopped with no notification. The group parameter should be left alone as it was added for future extension when a ThreadGroup is added to the Python language.

Subclassing Thread

The Thread class from the threading module can also be subclassed. This allows you more fine-grained control over your thread's creation, execution and eventual deletion. You will encounter subclassed threads often.

Let's rewrite the previous example using a subclass of Thread. Put the following code into a file named worker_thread_subclass.py.

# worker_thread_subclass.py

import random
import threading
import time

class WorkerThread(threading.Thread):

    def __init__(self, name):
        threading.Thread.__init__(self)
        self.name = name
        self.id = id(self)

    def run(self):
        """
        Run the thread
        """
        worker(self.name, self.id)

def worker(name: str, instance_id: int) -> None:
    print(f'Started worker {name} - {instance_id}')
    worker_time = random.choice(range(1, 5))
    time.sleep(worker_time)
    print(f'{name} - {instance_id} worker finished in '
          f'{worker_time} seconds')

if __name__ == '__main__':
    for i in range(5):
        thread = WorkerThread(name=f'computer_{i}')
        thread.start()

In this example, you create the WorkerThread class. The constructor of the class, __init__(), accepts a single argument, the name to be given to thread. This is stored off in an instance attribute, self.name. Then you override the run() method.

The run() method is already defined in the Thread class. It controls how the thread will run. It will call or invoke the function that you passed into the class when you created it. When you create your own run() method in your subclass, it is known as overriding the original. This allows you to add custom behavior such as logging to your thread that isn't there if you were to use the base class's run() method.

You call the worker() function in the run() method of your WorkerThread. The worker() function itself has a minor change in that it now accepts the instance_id argument which represents the class instance's unique id. You also need to update the print() functions so that they print out the instance_id.

The other change you need to do is in the __main__ conditional statement where you call WorkerThread and pass in the name rather than calling threading.Thread() directly as you did in the previous section.

When you call start() in the last line of the code snippet, it will call run() for you itself. The start() method is a method that is a part of the threading.Thread class and you did not override it in your code.

The output when you run this code should be similar to the original version of the code, except that now you are also including the instance id in the output. Give it a try and see for yourself!

Writing Multiple Files with Threads

There are several common use cases for using threads. One of those use cases is writing multiple files at once. It's always nice to see how you would approach a real-world problem, so that's what you will be doing here.

To get started, you can create a file named writing_thread.py. Then add the following code to your file:

# writing_thread.py

import random
import time
from threading import Thread


class WritingThread(Thread):

    def __init__(self, 
                 filename: str, 
                 number_of_lines: int,
                 work_time: int = 1) -> None:
        Thread.__init__(self)
        self.filename = filename
        self.number_of_lines = number_of_lines
        self.work_time = work_time

    def run(self) -> None:
        """
        Run the thread
        """
        print(f'Writing {self.number_of_lines} lines of text to '
              f'{self.filename}')
        with open(self.filename, 'w') as f:
            for line in range(self.number_of_lines):
                text = f'This is line {line+1}\n'
                f.write(text)
                time.sleep(self.work_time)
        print(f'Finished writing {self.filename}')

if __name__ == '__main__':
    files = [f'test{x}.txt' for x in range(1, 6)]
    for filename in files:
        work_time = random.choice(range(1, 3))
        number_of_lines = random.choice(range(5, 20))
        thread = WritingThread(filename, number_of_lines, work_time)
        thread.start()

Let's break this down a little and go over each part of the code individually:

import random
import time
from threading import Thread


class WritingThread(Thread):

    def __init__(self, 
                 filename: str, 
                 number_of_lines: int,
                 work_time: int = 1) -> None:
        Thread.__init__(self)
        self.filename = filename
        self.number_of_lines = number_of_lines
        self.work_time = work_time

Here you created the WritingThread class. It accepts a filename, a number_of_lines and a work_time. This allows you to create a text file with a specific number of lines. The work_time is for sleeping between writing each line to simulate writing a large or small file.

Let's look at what goes in run():

def run(self) -> None:
    """
    Run the thread
    """
    print(f'Writing {self.number_of_lines} lines of text to '
          f'{self.filename}')
    with open(self.filename, 'w') as f:
        for line in range(self.number_of_lines):
            text = f'This is line {line+1}\n'
            f.write(text)
            time.sleep(self.work_time)
    print(f'Finished writing {self.filename}')

This code is where all the magic happens. You print out how many lines of text you will be writing to a file. Then you do the deed and create the file and add the text. During the process, you sleep() to add some artificial time to writing the files to disk.

The last piece of code to look at is as follows:

if __name__ == '__main__':
    files = [f'test{x}.txt' for x in range(1, 6)]
    for filename in files:
        work_time = random.choice(range(1, 3))
        number_of_lines = random.choice(range(5, 20))
        thread = WritingThread(filename, number_of_lines, work_time)
        thread.start()

In this final code snippet, you use a list comprehension to create 5 file names. Then you loop over the files and create them. You use Python's random module to choose a random work_time amount and a random number_of_lines to write to the file. Finally you create the WritingThread and start() it.

When you run this code, you will see something like this get output:

Writing 5 lines of text to test1.txt
Writing 18 lines of text to test2.txt
Writing 7 lines of text to test3.txt
Writing 11 lines of text to test4.txt
Writing 11 lines of text to test5.txt
Finished writing test1.txt
Finished writing test3.txt
Finished writing test4.txtFinished writing test5.txt

Finished writing test2.txt

You may notice some odd output like the line a couple of lines from the bottom. This happened because multiple threads happened to write to stdout at once.

You can use this code along with Python's urllib.request to create an application for downloading files from the Internet. Try that project out on your own.

Wrapping Up

You have learned the basics of threading in Python. In this chapter, you learned about the following:

  • Pros of Using Threads
  • Cons of Using Threads
  • Creating Threads
  • Subclassing Thread
  • Writing Multiple Files with Threads

There is a lot more to threads and concurrency than what is covered here. You didn't learn about thread communication, thread pools, or locks for example. However you do know the basics of creating threads and you will be able to use them successfully. In the next chapter, you will continue to learn about concurrency in Python through discovering how multiprocessing works in Python!

Related Articles

This article is based on a chapter from Python 101: 2nd Edition. You can purchase Python 101 on Amazon or Leanpub.

Copyright © 2022 Mouse Vs Python | Powered by Pythonlibrary