Downloading files from the internet is something that almost every programmer will have to do at some point. Python provides several ways to do just that in its standard library. Probably the most popular way to download a file is over HTTP using the urllib or urllib2 module. Python also comes with ftplib for FTP downloads. Finally there’s a new 3rd party module that’s getting a lot of buzz called requests. We’ll be focusing on the two urllib modules and requests for this article.
Since this is a pretty simple task, we’ll just show a quick and dirty script that downloads the same file with each library and names the result slightly differently. We will download a zipped file from this very blog for our example script. Let’s take a look:
# Python 2 code import urllib import urllib2 import requests url = 'http://www.blog.pythonlibrary.org/wp-content/uploads/2012/06/wxDbViewer.zip' print "downloading with urllib" urllib.urlretrieve(url, "code.zip") print "downloading with urllib2" f = urllib2.urlopen(url) data = f.read() with open("code2.zip", "wb") as code: code.write(data) print "downloading with requests" r = requests.get(url) with open("code3.zip", "wb") as code: code.write(r.content)
As you can see, urllib is just a one-liner. It’s simplicity makes it very easy to use. On the other hand, the other two libraries are very simple too. For urllib2, you just have to open the url and then read it and write the data out. In fact, you could reduce that part of the script by one line by just doing the following:
f = urllib2.urlopen(url) with open("code2.zip", "wb") as code: code.write(f.read())
Either way, it works quite well. The requests library method is get, which corresponds to the HTTP GET. Then you just take the requests object and call its content property to get the data you want to write. We use the with statement because it will automatically close a file and simplifies the code. Note that just using “read()” can be dangerous if the file is large. It would be better to read it in pieces by passing read a size.
Update (June 8, 1012)
As pointed out by one of my readers, the urllib stuff changes considerably if you run it through the 2to3.py so that it’s in Python 3 format. So for completeness, here’s what the code looks like now:
# Python 3 code import urllib.request, urllib.parse, urllib.error url = 'http://www.blog.pythonlibrary.org/wp-content/uploads/2012/06/wxDbViewer.zip' print("downloading with urllib") urllib.request.urlretrieve(url, "code.zip") print("downloading with urllib2") f = urllib.request.urlopen(url) data = f.read() with open("code2.zip", "wb") as code: code.write(data)
You’ll notice that urllib2 no longer exists and that urllib.urlretrieve and urllib2.urlopen changed into urllib.request.urlretrieve and urllib.request.urlopen respectively. The rest is the same. I removed the requests portion for brevity.
So there you have it! Now you too can start downloading files using Python 2 or 3!