I’m always on the lookout for Python PDF libraries and I happened to stumble across pdfrw the other day. It looks like a replacement to pyPDF in that it can read and write PDFs, join PDFs and can use Reportlab for concatenation and watermarking, among other things. The project also appears slightly dead in that its last update was in 2011, but then again, pyPDF’s last update was in 2010, so it’s a little fresher. In this article, we’ll take a little test drive of pdfrw and see if it’s useful or not. Come and join the fun!

A note on installation: Sadly there is no setup.py script, so you’ll have to check it out of Google Code and just copy the pdfrw folder to site-packages or your virtualenv.

Joining PDFs Together with pdfrw

Joining two PDF files together into one is actually very simple with pdfrw. See below:

from pdfrw import PdfReader, PdfWriter
pages = PdfReader(r'C:\Users\mdriscoll\Desktop\1.pdf', decompress=False).pages
other_pages = PdfReader(r'C:\Users\mdriscoll\Desktop\2.pdf', decompress=False).pages
writer = PdfWriter()

What I find interesting is that you can also metadata to the file by doing something like this before you write it out:

writer.trailer.Info = IndirectPdfDict(
    Title = 'My Awesome PDF',
    Author = 'Mike',
    Subject = 'Python Rules!',
    Creator = 'myscript.py',

There’s also an included example that shows how to combine PDFs using pdfrw and reportlab. I’ll just reproduce it here:

# http://code.google.com/p/pdfrw/source/browse/trunk/examples/rl1/subset.py
import sys
import os
from reportlab.pdfgen.canvas import Canvas
import find_pdfrw
from pdfrw import PdfReader
from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl
def go(inpfn, firstpage, lastpage):
    firstpage, lastpage = int(firstpage), int(lastpage)
    outfn = 'subset_%s_to_%s.%s' % (firstpage, lastpage, os.path.basename(inpfn))
    pages = PdfReader(inpfn, decompress=False).pages
    pages = [pagexobj(x) for x in pages[firstpage-1:lastpage]]
    canvas = Canvas(outfn)
    for page in pages:
        canvas.doForm(makerl(canvas, page))
if __name__ == '__main__':
    inpfn, firstpage, lastpage = sys.argv[1:]
    go(inpfn, firstpage, lastpage)

I just thought that was really cool. It gives you a couple of alternatives to pyPDF’s writer anyway. There are lots of other interesting examples included with the package, including

  1. How to to use a pdf (page one) as the background for all other pages together with platypus.
  2. How to add a watermark

I think the project has potential. Hopefully we can generate enough interest to kickstart this project again or maybe get something new off the ground.

Print Friendly