An Intro to Protocol Buffers with Python

Protocol buffers are a data serialization format that is language agnostic. They are analogous to Python’s own pickle format, but one of the advantages of protocol buffers is that they can be used by multiple programming languages.

For example, Protocol buffers are supported in C++, C#, Dart, Go, Java, Kotlin, Objective-C, PHP, Ruby, and more in addition to Python. The biggest con for Protocol buffers is that far too often, the versions have changes that are not backward compatible.

In this article, you will learn how to do the following:

  • Creating a Protocol format
  • Compiling Your Protocol Buffers File
  • Writing Messages
  • Reading Messages

Let’s get started!

Creating a Protocol Format

You’ll need your own file to create your application using protocol buffers. For this project, you will create a way to store music albums using the Protocol Buffer format.

Create a new file named music.proto and enter the following into the file:

syntax = "proto2";

package music;

message Music {
  optional string artist_name = 1;
  optional string album = 2;
  optional int32 year = 3;
  optional string genre = 4;

}

message Library {
  repeated Music albums = 1;
}

The first line in this code is your package syntax, “proto2”. Next is your package declaration, which is used to prevent name collisions.

The rest of the code is made up of message definitions. These are groups of typed fields. There are quite a few differing types that you may use,  including boolint32floatdouble, and string.

You can set the fields to optional, repeated, or required. According to the documentation, you rarely want to use required because there’s no way to unset that. In fact, in proto3, required is no longer supported.

Compiling Your Protocol Buffers File

To be able to use your Protocol Buffer in Python, you will need to compile it using a program called protoc. You can get it here. Be sure to follow the instructions in the README file to get it installed successfully.

Now run protoc against your proto file, like this:

protoc --python_out=. .\music.proto --proto_path=.

The command above will convert your proto file to Python code in your current working directory. This new file is named music_pb2.py.

Here are the contents of that file:

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: music.proto
"""Generated protocol buffer code."""
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import symbol_database as _symbol_database
from google.protobuf.internal import builder as _builder
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x0bmusic.proto\x12\x05music\"H\n\x05Music\x12\x13\n\x0b\x61rtist_name\x18\x01 \x01(\t\x12\r\n\x05\x61lbum\x18\x02 \x01(\t\x12\x0c\n\x04year\x18\x03 \x01(\x05\x12\r\n\x05genre\x18\x04 \x01(\t\"\'\n\x07Library\x12\x1c\n\x06\x61lbums\x18\x01 \x03(\x0b\x32\x0c.music.Music')

_globals = globals()
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals)
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'music_pb2', _globals)
if _descriptor._USE_C_DESCRIPTORS == False:
  DESCRIPTOR._options = None
  _globals['_MUSIC']._serialized_start=22
  _globals['_MUSIC']._serialized_end=94
  _globals['_LIBRARY']._serialized_start=96
  _globals['_LIBRARY']._serialized_end=135
# @@protoc_insertion_point(module_scope)

There’s a lot of magic here that uses descriptors to generate classes for you. Don’t worry about how it works; the documentation doesn’t explain it well either.

Now that your Protocol Buffer is transformed into Python code, you can start serializing data!

Writing Messages

To start serializing your data with Protocol Buffers, you must create a new Python file.

Name your file music_writer.py and enter the following code:

from pathlib import Path
import music_pb2


def overwrite(path):
    write_or_append = "a"
    while True:
        answer = input(f"Do you want to overwrite '{path}' (Y/N) ?").lower()
        if answer not in "yn":
            print("Y or N are the only valid answers")
            continue

        write_or_append = "w" if answer == "y" else "a"
        break
    return write_or_append


def music_data(path):
    p = Path(path)
    write_or_append = "w"
    if p.exists():
        write_or_append = overwrite(path)

    library = music_pb2.Library()
    new_music = library.albums.add()

    while True:
        print("Let's add some music!\n")
        new_music.artist_name = input("What is the artist name? ")
        new_music.album = input("What is the name of the album? ")
        new_music.year = int(input("Which year did the album come out? "))

        more = input("Do you want to add more music? (Y/N)").lower()
        if more == "n":
            break

    with open(p, f"{write_or_append}b") as f:
        f.write(library.SerializeToString())

    print(f"Music library written to {p.resolve()}")


if __name__ == "__main__":
    music_data("music.pro")

The meat of this program is in your music_data() function. Here you check if the user wants to overwrite their music file or append to it. Then, you create a Library object and prompt the user for the data they want to serialize.

For this example, you ask the user to enter an artist’s name, album, and year. You omit the genre for now, but you can add that yourself if you’d like to.

After they enter the year of the album, you ask the user if they would like to add another album. The application will write the data to disk and end if they don’t want to continue adding music.

Here is an example writing session:

Let's add some music!

What is the artist name? Zahna
What is the name of the album? Stronger Than Death
Which year did the album come out? 2023
Do you want to add more music? (Y/N)Y
Let's add some music!

What is the artist name? KB
What is the name of the album? His Glory Alone II
Which year did the album come out? 2023
Do you want to add more music? (Y/N)N

Now, let’s learn how to read your data!

Reading Messages

Now that you have your Protocol Buffer Messages written to disk, you need a way to read them.

Create a new file named music_reader.py and enter the following code:

from pathlib import Path
import music_pb2


def list_music(music):
    for album in music.albums:
        print(f"Artist: {album.artist_name}")
        print(f"Album: {album.album}")
        print(f"Year: {album.year}")
        print()


def main(path):
    p = Path(path)
    library = music_pb2.Library()

    with open(p.resolve(), "rb") as f:
        library.ParseFromString(f.read())

    list_music(library)


if __name__ == "__main__":
    main("music.pro")

This code is a little simpler than the writing code was. Once again, you create an instance of the Library class. This time, you loop over the library albums and print out all the data in the file.

If you had added the music in this tutorial to your Protocol Buffers file, when you run your reader, the output would look like this:

Artist: Zahna
Album: Stronger Than Death
Year: 2023

Artist: KB
Album: His Glory Alone II
Year: 2023

Give it a try with your custom music file!

Wrapping Up

Now you know the basics of working with Protocol Buffers using Python. Specifically, you learned about the following:

  • Creating a Protocol format
  • Compiling Your Protocol Buffers File
  • Writing Messages
  • Reading Messages

Now is your chance to consider using this type of data serialization in your application. Once you have a plan, give it a try! Protocol buffers are a great way to create data in a programming language-agnostic way.

Further Reading