Entries tagged with “PyPDF”.


At my job, we sometimes need to write a PDF to memory instead of disk because we need to merge an overlay on to it. By writing to memory, we can speed up the process since we won’t have the extra step of writing the file to disk and than reading it back into memory again. Sadly, pyPdf’s PdfFileWriter() class doesn’t offer any support for extracting the binary string, so we have to StringIO instead. Here’s an example where I merge two PDFs into memory:

import pyPdf
from StringIO import StringIO
 
#----------------------------------------------------------------------
def mergePDFs(pdfOne, pdfTwo):
    """
    Merge PDFs
    """
    tmp = StringIO()
 
    output = pyPdf.PdfFileWriter()
 
    pdfOne = pyPdf.PdfFileReader(file(pdfOne, "rb"))
    for page in range(pdfOne.getNumPages()):
        output.addPage(pdfOne.getPage(page))
    pdfTwo = pyPdf.PdfFileReader(file(pdfTwo, "rb"))
    for page in range(pdfTwo.getNumPages()):
        output.addPage(pdfTwo.getPage(page))
 
    output.write(tmp)
    return tmp.getvalue()
 
 
if __name__ == "__main__":
    pdfOne = '/path/to/pdf/one'
    pdfTwo = '/path/to/pdf/two'
    pdfObj = mergePDFs(pdfOne, pdfTwo)

As you can see, all you need to do is create a StringIO() object, add some pages to the PdfFileWriter() object and then write the data to your StringIO object. Then to extract the binary string, you have to call StringIO’s getvalue() method. Simple, right? Now you have a file-like object in memory that you can use to add more pages to or overlay OMR mark on or whatever.

Related Artlcies

Today I learned that the pyPDF project is NOT dead, as I had originally thought. In fact, it’s been forked into PyPDF2 (note the slightly different spelling). There’s also a possibility that someone else has taken over the original pyPDF project and is actively working on it. You can follow all that over on reddit if you like. In the mean time, I decided to give PyPDF2 a whirl and see how it is different from the original. Feel free to follow along if you have a free moment or two. (more…)

A lot of websites are doing year-end retrospectives this week, so I thought you might find it interesting to know which articles on this blog were the most popular this year. Below you will find links to each article along with the page view count I got from Google Analytics:

  1. A Simple Step-by-Step Reportlab Tutorial, 9,709 page views, posted 03/08/2010
  2. Another Step-by-Step SqlAlchemy Tutorial Part 1, 7,746 page views, posted 02/03/2010
  3. Another Step-by-Step SqlAlchemy Tutorial Part 2, 4,858 page views, posted 02/03/2010
  4. Manipulating PDFs with Python and pyPdf, 4,511 page views, posted 05/15/2010
  5. Python 101: Introspection, 4,473 page views, posted 10/14/2010
  6. wxPython: Grid Tips and Tricks, 3,476 page views, posted 04/04/2010
  7. wxPython: Creating a Simple MP3 Player, 3,401 page views, posted 04/20/2010
  8. Python and Microsoft Office – Using PyWin32, 3,323 page views, posted 07/16/2010
  9. wxPython and Threads, 3,183 page views, posted 05/22/2010

It would seem that SqlAlchemy and Reportlab are pretty popular topics. Are there any articles about either of these cool packages that you think I should write? As you can see, wxPython makes it into the top ten 3 times! What should I write about next regarding wxPython?

This upcoming year, I plan to write about some of the other GUI toolkits. Which one do you think I should do first? Tkinter, PySide, PyGUI or something else? What packages or standard libraries do you think I should cover? Feel free to let me know via the comments below or via my contact form (link at top). I’m looking forward to another year of Python tinkering and writing and I hope you are too! Thanks for your readership and encouragement this year!