A Quick Intro to pdfrw

I’m always on the lookout for Python PDF libraries and I happened to stumble across pdfrw the other day. It looks like a replacement to pyPDF in that it can read and write PDFs, join PDFs and can use Reportlab for concatenation and watermarking, among other things. The project also appears slightly dead in that its last update was in 2011, but then again, pyPDF’s last update was in 2010, so it’s a little fresher. In this article, we’ll take a little test drive of pdfrw and see if it’s useful or not. Come and join the fun!

A note on installation: Sadly there is no setup.py script, so you’ll have to check it out of Google Code and just copy the pdfrw folder to site-packages or your virtualenv.

Joining PDFs Together with pdfrw

Joining two PDF files together into one is actually very simple with pdfrw. See below:

from pdfrw import PdfReader, PdfWriter

pages = PdfReader(r'C:\Users\mdriscoll\Desktop\1.pdf', decompress=False).pages
other_pages = PdfReader(r'C:\Users\mdriscoll\Desktop\2.pdf', decompress=False).pages

writer = PdfWriter()
writer.addpages(pages)
writer.addpages(other_pages)
writer.write(r'C:\Users\mdriscoll\Desktop\out.pdf')

What I find interesting is that you can also metadata to the file by doing something like this before you write it out:

writer.trailer.Info = IndirectPdfDict(
    Title = 'My Awesome PDF',
    Author = 'Mike',
    Subject = 'Python Rules!',
    Creator = 'myscript.py',
)

There’s also an included example that shows how to combine PDFs using pdfrw and reportlab. I’ll just reproduce it here:

# http://code.google.com/p/pdfrw/source/browse/trunk/examples/rl1/subset.py
import sys
import os

from reportlab.pdfgen.canvas import Canvas

import find_pdfrw
from pdfrw import PdfReader
from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl


def go(inpfn, firstpage, lastpage):
    firstpage, lastpage = int(firstpage), int(lastpage)
    outfn = 'subset_%s_to_%s.%s' % (firstpage, lastpage, os.path.basename(inpfn))

    pages = PdfReader(inpfn, decompress=False).pages
    pages = [pagexobj(x) for x in pages[firstpage-1:lastpage]]
    canvas = Canvas(outfn)

    for page in pages:
        canvas.setPageSize(tuple(page.BBox[2:]))
        canvas.doForm(makerl(canvas, page))
        canvas.showPage()

    canvas.save()

if __name__ == '__main__':
    inpfn, firstpage, lastpage = sys.argv[1:]
    go(inpfn, firstpage, lastpage)

I just thought that was really cool. It gives you a couple of alternatives to pyPDF’s writer anyway. There are lots of other interesting examples included with the package, including

  1. How to to use a pdf (page one) as the background for all other pages together with platypus.
  2. How to add a watermark

I think the project has potential. Hopefully we can generate enough interest to kickstart this project again or maybe get something new off the ground.

7 thoughts on “A Quick Intro to pdfrw”

  1. Pingback: Visto nel Web – 35 « Ok, panico

  2. I’m glad to hear about this lib. I tried your first example and it seems the out.pdf is written to the console and not a file. Any ideas?

    #!python

    import sys
    import os

    from pdfrw import PdfReader, PdfWriter
     
    pages = PdfReader(r’one.pdf’, decompress=False).pages
    other_pages = PdfReader(r’two.pdf’, decompress=False).pages
     
    writer = PdfWriter()
    writer.addpages(pages)
    writer.addpages(other_pages)
    writer.write(r’three.pdf’)

  3. The only different I see with your code is that you’re not passing an absolute path to the write method. Try doing that and see if that works. I just retried my code and it still works on Windows 7 with Python 2.6.6

  4. Thanks for the nice article!

    I have added a setup script and added it to PyPI, so it should now be available from easy_setup or pip.

  5. Timely commentary – Just to add my thoughts , if someone is requiring to merge two images , my assistant found notice here http://goo.gl/Xer1YN

  6. Pingback: Creating and Manipulating PDFs with pdfrw | The Mouse Vs. The Python

Comments are closed.