The PyPDF2 package is a pure-Python PDF library that you can use for splitting, merging, cropping and transforming pages in your PDFs. According to the PyPDF2 website, you can also use PyPDF2 to add data, viewing options and passwords to the PDFs too. Finally you can use PyPDF2 to extract text and metadata from your PDFs.
PyPDF2 is actually a fork of the original pyPdf which was written by Mathiew Fenniak and released in 2005. However, the original pyPdf’s last release was in 2014. A company called Phaseit, Inc spoke with Mathieu and ended up sponsoring PyPDF2 as a fork of pyPdf
At the time of writing this book, the PyPDF2 package hasn’t had a release since 2016. However it is still a solid and useful package that is worth your time to learn.
The following lists what we will be learning in this article:
- Extracting metadata
- Splitting documents
- Merging 2 PDF files into 1
- Rotating pages
- Overlaying / Watermarking Pages
- Encrypting / decrypting
Let’s start by learning how to install PyPDF2!
Installation
PyPDF2 is a pure Python package, so you can install it using pip (assuming pip is in your system’s path):
python -m pip install pypdf2
As usual, you should install 3rd party Python packages to a Python virtual environment to make sure that it works the way you want it to. Continue reading An Intro to PyPDF2