While researching PDF libraries for Python, I stumbled across another little project called metaPDF. According to its website, metaPDF is a lightweight Python library optimized for metadata extraction and insertion, and it is a fast wrapper over the excellent pyPdf library. It works by quickly searching the last 2048 bytes of the PDF before parsing the xref table, offering a 50-60% performance increase over directly parsing the table line by line. I’m not really sure how useful that will be, but let’s try it out and see what metaPDF can do. (more…)