PyPdf: How to Write a PDF to Memory

At my job, we sometimes need to write a PDF to memory instead of disk because we need to merge an overlay on to it. By writing to memory, we can speed up the process since we won’t have the extra step of writing the file to disk and than reading it back into memory again. Sadly, pyPdf’s PdfFileWriter() class doesn’t offer any support for extracting the binary string, so we have to StringIO instead. Here’s an example where I merge two PDFs into memory:

import pyPdf
from StringIO import StringIO
def mergePDFs(pdfOne, pdfTwo):
    Merge PDFs
    tmp = StringIO()
    output = pyPdf.PdfFileWriter()
    pdfOne = pyPdf.PdfFileReader(file(pdfOne, "rb"))
    for page in range(pdfOne.getNumPages()):
    pdfTwo = pyPdf.PdfFileReader(file(pdfTwo, "rb"))
    for page in range(pdfTwo.getNumPages()):
    return tmp.getvalue()
if __name__ == "__main__":
    pdfOne = '/path/to/pdf/one'
    pdfTwo = '/path/to/pdf/two'
    pdfObj = mergePDFs(pdfOne, pdfTwo)

As you can see, all you need to do is create a StringIO() object, add some pages to the PdfFileWriter() object and then write the data to your StringIO object. Then to extract the binary string, you have to call StringIO’s getvalue() method. Simple, right? Now you have a file-like object in memory that you can use to add more pages to or overlay OMR mark on or whatever.

Related Artlcies

Print Friendly
  • Dav1d

    Great post, only one thing I noticed, if you use Python 2.6+ there is the new “io” Module and StringIO should be replaced with BytesIO.

  • mb

    omr mark?

  • Optical Mark Recognition – You use those to tell a paper stuffer when to stuff a packet of pages into an envelope, among other things.

  • mb

    Ah. Thanks.