Entries tagged with “Python PDF Series”.


I do a lot of PDF report creation with Python using Reportlab. Occasionally I’ll throw PyPDF in as well. So I’m always on the lookout for other mature Python PDF tools. PDFDocument isn’t exactly mature, but it’s kind of interesting. The PDFDocument project is actually a wrapper for Reportlab. You can get it on github. I found the project easy to use, but pretty limiting. Let’s take a few minutes to examine how it works. (more…)

At my job, we sometimes need to write a PDF to memory instead of disk because we need to merge an overlay on to it. By writing to memory, we can speed up the process since we won’t have the extra step of writing the file to disk and than reading it back into memory again. Sadly, pyPdf’s PdfFileWriter() class doesn’t offer any support for extracting the binary string, so we have to StringIO instead. Here’s an example where I merge two PDFs into memory:

import pyPdf
from StringIO import StringIO
 
#----------------------------------------------------------------------
def mergePDFs(pdfOne, pdfTwo):
    """
    Merge PDFs
    """
    tmp = StringIO()
 
    output = pyPdf.PdfFileWriter()
 
    pdfOne = pyPdf.PdfFileReader(file(pdfOne, "rb"))
    for page in range(pdfOne.getNumPages()):
        output.addPage(pdfOne.getPage(page))
    pdfTwo = pyPdf.PdfFileReader(file(pdfTwo, "rb"))
    for page in range(pdfTwo.getNumPages()):
        output.addPage(pdfTwo.getPage(page))
 
    output.write(tmp)
    return tmp.getvalue()
 
 
if __name__ == "__main__":
    pdfOne = '/path/to/pdf/one'
    pdfTwo = '/path/to/pdf/two'
    pdfObj = mergePDFs(pdfOne, pdfTwo)

As you can see, all you need to do is create a StringIO() object, add some pages to the PdfFileWriter() object and then write the data to your StringIO object. Then to extract the binary string, you have to call StringIO’s getvalue() method. Simple, right? Now you have a file-like object in memory that you can use to add more pages to or overlay OMR mark on or whatever.

Related Artlcies

While researching PDF libraries for Python, I stumbled across another little project called metaPDF. According to its website, metaPDF is a lightweight Python library optimized for metadata extraction and insertion, and it is a fast wrapper over the excellent pyPdf library. It works by quickly searching the last 2048 bytes of the PDF before parsing the xref table, offering a 50-60% performance increase over directly parsing the table line by line. I’m not really sure how useful that will be, but let’s try it out and see what metaPDF can do. (more…)

Today I learned that the pyPDF project is NOT dead, as I had originally thought. In fact, it’s been forked into PyPDF2 (note the slightly different spelling). There’s also a possibility that someone else has taken over the original pyPDF project and is actively working on it. You can follow all that over on reddit if you like. In the mean time, I decided to give PyPDF2 a whirl and see how it is different from the original. Feel free to follow along if you have a free moment or two. (more…)

Today we’ll be looking at a simple PDF generation library called pyfpdf, a port of FPDF which is a php library. This is not a replacement for Reportlab, but it does give you more than enough to create simple PDFs and may meet your needs. Let’s take a look and see what it can do! (more…)

I’m always on the lookout for Python PDF libraries and I happened to stumble across pdfrw the other day. It looks like a replacement to pyPDF in that it can read and write PDFs, join PDFs and can use Reportlab for concatenation and watermarking, among other things. The project also appears slightly dead in that its last update was in 2011, but then again, pyPDF’s last update was in 2010, so it’s a little fresher. In this article, we’ll take a little test drive of pdfrw and see if it’s useful or not. Come and join the fun! (more…)

Recently I needed the ability to use Reportlab’s flowables, but place them in fixed locations. Some of you are probably wondering why I would want to do that. The nice thing about flowables, like the Paragraph, is that they’re easily styled. If I could bold something or center something AND put it in a fixed location, then that would rock! It took a lot of Googling and trial and error, but I finally got a decent template put together that I could use for mailings. In this article, I’m going to show you how to do this too. (more…)

There are several cool ways to create PDFs with Python. In this article we will be focusing on a cool little tool called rst2pdf, which takes a text file that contains Restructured Text and converts it to a PDF. The rst2pdf package requires Reportlab to function. This won’t be a tutorial on Restructured Text, although we’ll have to discuss it to some degree just to understand what’s going on. (more…)

Back in March of this year, I wrote a simple tutorial on Reportlab, a handy 3rd party Python package that allows the developer to create PDFs programmatically. Recently, I received a request to cover how to do tables in Reportlab. Since my Reportlab article is so popular, I figured it was probably worth the trouble to figure out tables. In this article, I will attempt to show you the basics of inserting tables into Reportlab generated PDFs. (more…)

There’s a handy 3rd party module called pyPdf out there that you can use to merge PDFs documents together, rotate pages, split and crop pages, and decrypt/encrypt PDF documents. In this article, we’ll take a look at a few of these functions and then create a simple GUI with wxPython that will allow us to merge a couple of PDFs. (more…)