An intro to aiohttp

Python 3.5 added some new syntax that allows developers to create asynchronous applications and packages easier. One such package is aiohttp which is an HTTP client/server for asyncio. Basically it allows you to write asynchronous clients and servers. The aiohttp package also supports Server WebSockets and Client WebSockets. You can install aiohttp using pip:

pip install aiohttp

Now that we have aiohttp installed, let’s take a look at one of their examples!


Fetching a Web Page

The documentation for aiohtpp has a fun example that shows how to grab a web page’s HTML. Let’s take a look at it and see how it works:

import aiohttp
import asyncio
import async_timeout

async def fetch(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            return await response.text()

async def main(loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        html = await fetch(session, 'https://www.blog.pythonlibrary.org')
        print(html)

loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))

Here we just import aiohttp, Python’s asyncio and async_timeout, which gives us the ability to timeout a coroutine. We create our event loop at the bottom of the code and call the main() function. It will create a ClientSession object that we pass to our fetch() function along with what URL to fetch. Finally in the fetch() function, we use set our timeout and attempt to get the URL’s HTML. If everything works without timing out, you will see a bunch of text spewed into stdout.


Downloading Files with aiohttp

A fairly common task that developers will do is download files using threads or processes. We can download files using coroutines too! Let’s find out how:

import aiohttp
import asyncio
import async_timeout
import os


async def download_coroutine(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            filename = os.path.basename(url)
            with open(filename, 'wb') as f_handle:
                while True:
                    chunk = await response.content.read(1024)
                    if not chunk:
                        break
                    f_handle.write(chunk)
            return await response.release()


async def main(loop):
    urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
        "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]

    async with aiohttp.ClientSession(loop=loop) as session:
        for url in urls:
            await download_coroutine(session, url)


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(loop))

You will notice here that we import a couple of new items: aiohttp and async_timeout. The latter is a actually one of the aiohttp’s dependencies and allows us to create a timeout context manager. Let’s start at the bottom of the code and work our way up. In the bottom conditional statement, we start our asynchronous event loop and call our main function. In the main function, we create a ClientSession object that we pass on to our download coroutine function for each of the urls we want to download. In the download_coroutine, we create an async_timeout.timeout() context manager that basically creates a timer of X seconds. When the seconds run out, the context manager ends or times out. In this case, the timeout is 10 seconds. Next we call our session’s get() method which gives us a response object. Now we get to the part that is a bit magical. When you use the content attribute of the response object, it returns an instance of aiohttp.StreamReader which allows us to download the file in chunks of whatever size we’d like. As we read the file, we write it out to local disk. Finally we call the response’s release() method, which will finish the response processing.

According to aiohttp’s documentation, because the response object was created in a context manager, it technically calls release() implicitly. But in Python, explicit is usually better and there is a note in the documentation that we shouldn’t rely on the connection just going away, so I believe that it’s better to just release it in this case.

There is one part that is still blocking here and that is the portion of the code that actually writes to disk. While we are writing the file, we are still blocking. There is another library called aiofiles that we could use to try and make the file writing asynchronous too We will take a look at that next.

Note: The section above came from one of my previous articles.


Using aiofiles For Asynchronous Writing

You will need to install aiofiles to make this work. Let’s get that out of that way:

pip install aiofiles

Now that we have all the items we need, we can update our code! Note that this code only works in Python 3.6 or above.

import aiofiles
import aiohttp
import asyncio
import async_timeout
import os


async def download_coroutine(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            filename = os.path.basename(url)
            async with aiofiles.open(filename, 'wb') as fd:
                while True:
                    chunk = await response.content.read(1024)
                    if not chunk:
                        break
                    await fd.write(chunk)
            return await response.release()


async def main(loop, url):
    async with aiohttp.ClientSession(loop=loop) as session:
        await download_coroutine(session, url)


if __name__ == '__main__':
    urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
    
    loop = asyncio.get_event_loop()
    loop.run_until_complete(
            asyncio.gather(
                *(main(loop, url) for url in urls)
                )
                )

The only change is adding an import for aiofiles and then changing how we open the file. You will note that it is now

async with aiofiles.open(filename, 'wb') as fd:

And that we use await for the writing portion of the code:

await fd.write(chunk)

Other than that, the code is the same. There are some portability issues mentioned here that you should be aware of.


Wrapping Up

Now you should have some basic understanding of how to use aiohttp and aiofiles. The documentation for both projects is worth a look as this tutorial really only scratches the surface of what you can do with these libraries.


Related Reading

3 thoughts on “An intro to aiohttp”

  1. “””
    for url in urls:
    await download_coroutine(session, url)
    “””

    Does it block for every url? This code doesn’t look like it will download all urls asynchronously.

  2. My understanding is that the url downloading part works asynchronously by reading a chunk of 1024 bytes at a time per coroutine. However I did note that writing to disk is NOT asynchronous and will block unless you use aiofiles

  3. I’m new to asyncio, I may be wrong. But I think “await” keyword executes the coroutine and wait for its result. Therefore, the code may download every url one by one.

Comments are closed.