I’m always on the lookout for Python PDF libraries and I happened to stumble across pdfrw the other day. It looks like a replacement to pyPDF in that it can read and write PDFs, join PDFs and can use Reportlab for concatenation and watermarking, among other things. The project also appears slightly dead in that its last update was in 2011, but then again, pyPDF’s last update was in 2010, so it’s a little fresher. In this article, we’ll take a little test drive of pdfrw and see if it’s useful or not. Come and join the fun!
A note on installation: Sadly there is no setup.py script, so you’ll have to check it out of Google Code and just copy the pdfrw folder to site-packages or your virtualenv.
Joining PDFs Together with pdfrw
Joining two PDF files together into one is actually very simple with pdfrw. See below:
from pdfrw import PdfReader, PdfWriter pages = PdfReader(r'C:\Users\mdriscoll\Desktop\1.pdf', decompress=False).pages other_pages = PdfReader(r'C:\Users\mdriscoll\Desktop\2.pdf', decompress=False).pages writer = PdfWriter() writer.addpages(pages) writer.addpages(other_pages) writer.write(r'C:\Users\mdriscoll\Desktop\out.pdf')
What I find interesting is that you can also metadata to the file by doing something like this before you write it out:
writer.trailer.Info = IndirectPdfDict( Title = 'My Awesome PDF', Author = 'Mike', Subject = 'Python Rules!', Creator = 'myscript.py', )
There’s also an included example that shows how to combine PDFs using pdfrw and reportlab. I’ll just reproduce it here:
# http://code.google.com/p/pdfrw/source/browse/trunk/examples/rl1/subset.py import sys import os from reportlab.pdfgen.canvas import Canvas import find_pdfrw from pdfrw import PdfReader from pdfrw.buildxobj import pagexobj from pdfrw.toreportlab import makerl def go(inpfn, firstpage, lastpage): firstpage, lastpage = int(firstpage), int(lastpage) outfn = 'subset_%s_to_%s.%s' % (firstpage, lastpage, os.path.basename(inpfn)) pages = PdfReader(inpfn, decompress=False).pages pages = [pagexobj(x) for x in pages[firstpage-1:lastpage]] canvas = Canvas(outfn) for page in pages: canvas.setPageSize(tuple(page.BBox[2:])) canvas.doForm(makerl(canvas, page)) canvas.showPage() canvas.save() if __name__ == '__main__': inpfn, firstpage, lastpage = sys.argv[1:] go(inpfn, firstpage, lastpage)
I just thought that was really cool. It gives you a couple of alternatives to pyPDF’s writer anyway. There are lots of other interesting examples included with the package, including
- How to to use a pdf (page one) as the background for all other pages together with platypus.
- How to add a watermark
I think the project has potential. Hopefully we can generate enough interest to kickstart this project again or maybe get something new off the ground.