Fri 16 Jul 2010
Most typical users have used Microsoft Office. While Office may be the bane of tech support, we still have to deal with it. Python can be used to script (AKA automate) Office and make it easier for us or our users to use. It may not be as easy as recording a macro, but it’s close. For this article, you will learn how to use the PyWin32 module to access some of the Office programs and manipulate them with Python. Some forums say that you need to run PythonWin’s makepy utility on Microsoft Word (and Excel) before you can access Office applications. I don’t think I needed to do that to make it work though (at least, not with the 2007 version). However, PythonWin comes with the PyWin32, so if you do run into trouble, you can try it.
Python and Microsoft Excel
If you’ve looked for examples of using Python and Office, you’ll usually find that the most often hacked component is Excel. In fact, there are several non-PyWin32 modules specifically created to read and write Excel files. They are called xlrd and xlwt, respectively. But that’s a topic for another article. Here we’ll see how to mess with Excel using the PyWin32 interface. Note that the following scripts only work on Windows. One advantage of xlrd and xlwt is that you can use them on any platform.
Let’s take a look at a simple example, shall we?
import time import win32com.client as win32 #---------------------------------------------------------------------- def excel(): """""" xl = win32.gencache.EnsureDispatch('Excel.Application') ss = xl.Workbooks.Add() sh = ss.ActiveSheet xl.Visible = True time.sleep(1) sh.Cells(1,1).Value = 'Hacking Excel with Python Demo' time.sleep(1) for i in range(2,8): sh.Cells(i,1).Value = 'Line %i' % i time.sleep(1) ss.Close(False) xl.Application.Quit() if __name__ == "__main__": excel()
The example above is similar to what you will normally find on the web. It is actually based on an example I saw in Wesley Chun’s excellent book, Core Python Programming. Let’s take some time and unpack the code. To gain access to Excel, we import win32com.client and then call its gencache.EnsureDispatch, passing in the application name that we want to open. In this case, the string to pass is “Excel.Application”. All that does is open Excel in the background. At this point, the user won’t even know Excel is open unless they have Task Manager running. The next line is adds a new workbook to Excel by calling that Excel instance’s “Workbooks.Add()” method. This returns a sheets object (I think). To get the ActiveSheet, we call ss.ActiveSheet. Finally, we make the Excel program itself visible by setting that property to True.
To set a specific cell’s value, call something like this: sh.Cells(row,col).Value = “some value”. Note that our instance is NOT zero-based and will actually put the value in the correct row/col combo. If we want to extract a value, we just remove the equals sign. What if we want the formula? To figure this our, I recorded a macro in Excel and did a Paste Special command that only pasted the Formula. Using the code generated, I figured out that to get the formula in Python, you just do this:
formula = sh.Cells(row, col).Formula
What if you need to change which sheet you’re on? Recording a macro also showed me how to accomplish this feat. Here’s the VBA code from Excel:
' Macro1 Macro
From this code, I gathered that I needed to call my sheets object’s “Sheets” method and after a little fiddling, I got it to work by doing the following:
sheet2 = ss.Sheets("Sheet2")
Now we have a handle on the second sheet in the workbook. If you want to edit or retrieve values, just prepend the same methods you used above with whatever you called the sheet2 instance (i.e. sheet2.Cells(1,1).Value). The last two lines from the original program will close the sheets and then quit the entire Excel instance.
You may be thinking that so far all I’ve shown is how to create a new document. What if you want to open an existing file? Just do something like this at the beginning of the code:
xl = win32.gencache.EnsureDispatch('Excel.Application') ss = xl.Workbooks.Open(filename)
And there you have it! You now know the basics of hacking Excel with Python using Excel’s COM object model. If you need to learn more, I recommend trying to record a macro and than translating the result into Python. Note: I was unable to find an example that could save the spreadsheet…there are several examples that claim they work, but they didn’t for me.
Python and Microsoft Word
Accessing Microsoft Word with Python follows the same syntax that we used for Excel. Let’s take a quick look at how to access Word.
from time import sleep import win32com.client as win32 RANGE = range(3, 8) def word(): word = win32.gencache.EnsureDispatch('Word.Application') doc = word.Documents.Add() word.Visible = True sleep(1) rng = doc.Range(0,0) rng.InsertAfter('Hacking Word with Python\r\n\r\n') sleep(1) for i in RANGE: rng.InsertAfter('Line %d\r\n' % i) sleep(1) rng.InsertAfter("\r\nPython rules!\r\n") doc.Close(False) word.Application.Quit() if __name__ == '__main__': word()
This particular example is also based on something from Chun’s book as well. However, there are lots of other examples on the web that look almost exactly like this too. Let’s unpack this code now. To get a handle on the Microsoft Word application, we call win32.gencache.EnsureDispatch(‘Word.Application’); then we add a new document by calling the word instance’s Documents.Add(). If you want to show the user what you’re up to, you can set the visibility of Word to True.
If you want to add text to the document, then you’ll want to tell Word where you want the text to go. That’s where the Range method comes in. While you can’t see it, there is a “grid” of sorts that tells Word how to layout the text onscreen. So if we want to insert text at the very top of the document, we tell it to start at (0,0). To add a new line in Word, we need to append “\r\n” to the end of our string. If you don’t know about the annoyances of line endings on different platforms, you should spend some time with Google and learn about it so you don’t get bit by weird bugs!
The rest of the code is pretty self-explanatory and will be left to the reader to interpret. We’ll move on to opening and saving documents now:
# Based on examples from http://code.activestate.com/recipes/279003/ word.Documents.Open(doc) word.ActiveDocument.SaveAs("c:\\a.txt", FileFormat=win32com.client.constants.wdFormatTextLineBreaks)
Here we show how to open an existing Word document and save it as text. I haven’t tested this one fully, so your mileage may vary. If you want to read the text in the document, you can do the following:
docText = word.Documents.Content
And that ends the Python hacking lesson on Word documents. Since a lot of the information I was finding on Microsoft Word and Python was old and crusty and didn’t seem to work half the time, I don’t add to the mess of bad information. Hopefully this will get you started on your own journey into the wild wonders of Word manipulation.
- Python for Windows Examples
- Opening MS Word files via Python thread
- Recipe 279003: Converting Word documents to text
- Dzone: script excel from Python
- Python-Excel website, mailing list