Python 101: How to submit a web form

Today we’ll spend some time looking at three different ways to make Python submit a web form. In this case, we will be doing a web search with duckduckgo.com searching on the term “python” and saving the result as an HTML file. We will use Python’s included urllib modules and two 3rd party packages: requests and mechanize. We have three small scripts to cover, so let’s get cracking!

Submitting a web form with urllib

We will start with urllib and urllib2 since they are included in Python’s standard library. We’ll also import the webbrowser to open the search results for viewing. Here’s the code:

import urllib
import urllib2
import webbrowser

data = urllib.urlencode({'q': 'Python'})
url = 'http://duckduckgo.com/html/'
full_url = url + '?' + data
response = urllib2.urlopen(full_url)
with open("results.html", "w") as f:
    f.write(response.read())

webbrowser.open("results.html")

The first thing you have to do when you want to submit a web form is figure out what the form is called and what the url is that you will be posting to. If you go to duckduckgo’s website and view the source, you’ll notice that its action is pointing to a relative link, “/html”. So our url is “http://duckduckgo.com/html”. The input field is named “q”, so to pass duckduckgo a search term, we have to concatenate the url to the “q” field. The results are read and written to disk. Finally, we open our saved results using the webbrowser module. Now let’s find out how this process differs when using the requests package.

Submitting a web form with requests

The requests package does form submissions a little bit more elegantly. Let’s take a look:

# Python 2.x example
import requests

url = 'https://duckduckgo.com/html/'
payload = {'q':'python'}
r = requests.get(url, params=payload)
with open("requests_results.html", "w") as f:
    f.write(r.content)

With requests, you just need to create a dictionary with the field name as the key and the search term as the value. Then you use requests.get to do the search. Finally you use the resulting requests object, “r”, and access its content property which you save to disk. We skipped the webbrowser part in this example (and the next) for brevity.

In Python 3, it should be noted that r.content now returns bytes instead of a string. This will cause a TypeError to be raised if we try to write that out to disc. To fix it, all we need to do is change the file flag from ‘w’ to ‘wb’, like this:

with open("requests_results.html", "wb") as f:
    f.write(r.content)

Now we should be ready to see how mechanize does its thing.

Submitting a web form with mechanize

The mechanize module has lots of fun features for browsing the internet with Python. Sadly it doesn’t support javascript. Anyway, let’s get on with the show!

import mechanize

url = "http://duckduckgo.com/html"
br = mechanize.Browser()
br.set_handle_robots(False) # ignore robots
br.open(url)
br.select_form(name="x")
br["q"] = "python"
res = br.submit()
content = res.read()
with open("mechanize_results.html", "w") as f:
    f.write(content)

As you can see, mechanize is a little more verbose than the other two methods were. We also need to tell it to ignore the robots.txt directive or it will fail. Of course, if you want to be a good netizen, then you shouldn’t ignore it. Anyway, to start off, you need a Browser object. Then you open the url, select the form (in this case, “x”) and set up a dictionary with the search parameters as before. Note that in each method, the dict setup is a little different. Next you submit the query and read the result. Finally you save the result to disk and you’re done!

Wrapping Up

Of the three, requests was probably the simplest with urllib being a close follow-up. Mechanize is made for doing a lot more then the other two though. It’s made for screen scraping and website testing, so it’s no surprise it’s a little more verbose. You can also do form submission with selenium, but you can read about that in this blog’s archives. I hope you found this article interesting and perhaps inspiring. See you next time!

Source Code

form_submission.zip

Pingback: Links « Frackmente

Pingback: Scraping with Mechanize and BeautifulSoup | A geek with a hat

Choperro

March 27, 2014 at 12:15 am

The first example causes a strange little query web to open in my browser. But no results about a “python” search.

March 27, 2014 at 12:20 am

How do you import the 3rd party libs without installing them, just for testing?

Mike Driscoll

March 27, 2014 at 8:21 am

It looks like they changed their interface slightly. Not sure how to fix it though…

March 27, 2014 at 8:22 am

You should use virtualenv. Then you can create a virtual environment in Python where you can install modules for testing before you install them to your main Python installation.

April 3, 2014 at 9:27 am

Sorry for the delay, but I figured it out and updated the urllib and requests examples.

Zee Zhang

August 3, 2015 at 11:51 am

How could I open an already logged page in my browser? In urllib, I can use urlopen to submit my login info, but when I use webBrowser to open an after logged page, it always redirects me to the login page. I heared Selenium could add cookies to the web opener, but it needs to add different frameworks for different browsers. Is there any other ways?

April 30, 2016 at 1:37 pm

Have you tried your requests example?

Traceback (most recent call last):

File “test.py”, line 7, in

f.write(r.content)

TypeError: write() argument must be str, not bytes

April 30, 2016 at 1:54 pm

Yes, I’ve tried it and so have thousands of others. It works fine in Python 2. However, in Python 3, things changed slightly such that now f.content returns bytes instead of a string. All you need to do is change the file flag from ‘w’ to ‘wb’ and it will work. I went ahead and added a note to the article too. Thanks.

Abdechahid Ihya

November 22, 2016 at 10:01 am

instead, you can use r.text in order to get str object not bytes object.
it worked well for me

November 28, 2016 at 8:10 am

Not sure why I didn’t respond to this, but urllib supports Cookies: http://stackoverflow.com/questions/3334809/python-urllib2-how-to-send-cookie-with-urlopen-request

November 28, 2016 at 8:12 am

I have noticed that that it is usually a post but it really depends on how the web page’s API is set up. You will have to use your web browser’s inspection tools to figure it out. Chrome and Firefox have some pretty powerful inspection tools that I’ve used in the past to help me write web crawlers and figure out how to submit forms. I would start there.