Protobuf: Structured Code with Protocol Buffers

IONOS editorial team06/23/2020

Contents

The structuring of data plays an important role in the development of programs and websites. If project data is well structured, for example, it can be easily and precisely read by other software. On the Internet, this is especially important for text-based search engines such as Google, Bing or Yahoo, which can capture the content of a website thanks to corresponding, structured distinctions.

The use of structured data in software development is generally worthwhile - whether for Internet or desktop applications - wherever programs or services have to exchange data via interfaces and a high data processing speed is desired. You will learn the role the serialization format Protocol Buffers (Protobuf) can play and how this structuring method differs from the known alternative JSONP in this article.

What is Protobuf (Protocol Buffers)?

Protocol Buffers, or Protobuf for short, a data interchange format originally developed for internal use, has been offered to the general public as an open source project (partly Apache 2.0 license) by Google since 2008. The binary format enables applications to store as well as exchange structured data in an uncomplicated way, whereby these programs can even be written in different programming languages. The following, including others, are supported languages:

C#
C++
Go
Objective-C
Java
Python
Ruby

Protobuf is used in combination with HTTP and RPCs (Remote Procedure Calls) for local and remote client-server communication - to describe the interfaces required here. The protocol composition is also called gRPC.

What are the benefits of Google’s Protocol Buffers?

When developing Protobuf, Google placed emphasis on two factors: Simplicity and performance. At the time of development, the format - as already mentioned, initially used internally at Google - was to replace the similar XML format. Today it is also in competition with other solutions such as JSON(P) or FlatBuffers. As Protocol Buffers are still the better choice for many projects, an analysis makes the characteristics and strengths of this structuring method clear:

Clear, cross-application schemes

The basis of every successful application is a well-organized database system. A great deal of attention is paid to the organization of this system - including the data it contains - but the underlying structures are then lost at the latest when the data is forwarded to a third-party service. The unique encoding of the data in the Protocol Buffers schema ensures that your project forwards structured data as desired, without these structures being broken up.

Backward and forward compatibility

The implementation of Protobuf spares the annoying execution of version checks, which is usually associated with "ugly" code. In order to maintain backward compatibility with older versions or forward compatibility with new versions, Protocol Buffers uses numbered fields that serve as reference points for accessing services. This means you do not always have to adapt the entire code in order to publish new features and functions.

Flexibility and comfort

With Protobuf coding, you automatically use modifiers (optional: required, optional or repeated) which simplify the programming work considerably. This way the structuring method allows you to determine data structure at scheme level, whereupon the implementation details of the classes used for the different programming languages are automatically regulated. You can also change the status at any time, for example from "required" to "optional". The transport of data structures can also be regulated using Protocol Buffers: Through the coding of generic query and response structures, a flexible and secure data transfer between multiple services is ensured in a simple manner.

Less boilerplate code

Boilerplate code (or simply boilerplate) plays a decisive role in programming, depending on the type and complexity of a project. Put simply, it is reusable code blocks that are needed in many places in software and are usually only slightly customizable. Such code is often used, for example, to prepare the use of functions from libraries. Boilerplates are common in the web languages JavaScript, PHP, HTML and CSS in particular, although this is not optimal for the performance of the web application. A suitable Protocol Buffers scheme helps to reduce the boilerplate code and thereby improve performance in the long term.

Easy language interoperability

It is part of today's standard, that applications are no longer simply written in one language, but that program parts or modules combine different language types. Protobuf simplifies interaction between the individual code components considerably. If new components are added whose language differs from the current project language, you can simply translate the Protocol Buffers scheme into the respective target language using the appropriate code generator, whereby your own effort is reduced to a minimum. The prerequisite is, of course, that the languages used are those supported by Protobuf by default, such as the languages already listed, or via a third-party add-on.

Protobuf vs. JSON: The two formats in comparison

First and foremost, Google developed Protocol Buffers as an alternative to XML (Extensible Markup Language) and exceeded the markup language in many ways. Therefore structuring the data with Protobuf not only tends to be simpler, but according to the search engine giant, also ensures a data structure that is between three to ten times smaller and 20 to 100 times faster than a comparable XML structure.

Also, with the JavaScript markup language JSON (JavaScript Object Notation), Protocol Buffers often makes a direct comparison, whereby it should be mentioned that both technologies were designed with different objectives: JSON is a message format which originated from JavaScript, which exchanges its messages in text format and is supported by practically all common programming languages. The functionality of Protobuf includes more than one message format, as Google technology also offers various rules and tools for defining and exchanging messages. Protobuf also generally outperforms JSON when you look at the sending of messages in general, but the following tabular “Protobuf vs. JSON” list shows that both structuring techniques have their advantages and disadvantages:

	Protobuf	JSON
Developer	Google	Douglas Crockford
Function	Markup format for structured data (storage and transmission) and library	Markup format for structured data (storage and transmission)
Binary format	Yes	No
Standardization	No	Yes
Human-readable format	Partially	Yes
Community/Documentation	Small community, expandable online manuals	Huge community, good official documentation as well as various online tutorials etc.

So, if you need a well-documented serialization format that stores and transmits the structured data in human-readable form, you should use JSON instead of Protocol Buffers. This is especially true if the server-side part of the application is written in JavaScript and if a large part of the data is processed directly by browsers by default. On the other hand, if flexibility and performance of the data structure play a decisive role, Protocol Buffers tends to be the more efficient and better solution.

Tutorial: Practical introduction to Protobuf using the example of Java

Protocol Buffers can make the difference in many software projects, but as is often the case, the first thing to do is get to know the particularities and syntactic tricks of the serialization technology and how to apply them. To give you an initial impression of Protobuf's syntax and message exchange, the following tutorial explains the basic steps with Protobuf - from defining your own format in a .proto file, to compiling the Protocol Buffers structures. A simple Java address book application example will be used as a code base that can read contact information from a file and write to a file. The parameters "Name", "ID", "email address" and "Telephone number" are assigned to each address book entry.

Define your own data format in the .proto file

You first describe any data structure that you want to implement with Protocol Buffers in the .proto file, the default configuration file of the serialization format. For each structure that you want to serialize in this file - that is, map in succession - simply add a message. Then you specify names and types for each field of this message and append the desired modifier(s). One modifier is required per field.

One possible mapping of the data structures in the .proto file looks as follows for the Java address book:

syntax = "proto3";
package tutorial;
option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";
message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;
    enum PhoneType {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
    }
    message PhoneNumber {
        required string number = 1;
        optional PhoneType type = 2 [default = HOME];
    }
    repeated PhoneNumber phones = 4;
}
message AddressBook {
    repeated Person people = 1;
}

The syntax of Protocol Buffers is therefore strongly reminiscent of C++ or Java. The Protobuf version is always declared first (here proto3), followed by the description of the software package whose data you want to structure. This includes a unique name ("tutorial“) and, in this code example, the two Java-specific options "java_package"(Java package in which the generated classes are saved) and "java_outer_classname“ (defines the class name under which the classes are summarized).

This is followed by the Protobuf messages, which can be composed of any number of fields, whereby the typical data types such as "bool", "int32", "float", "double", or "string" are available. Some of these are also used in the example. As already mentioned, each field of a message must be assigned at least one modifier - i.e. either...

required: a value for the field is mandatory. If this value is missing, the message remains "uninitialized", i.e. not initialized or unsent.
optional: a value can be provided in an optional field but does not have to. If this is not the case, a value defined as the standard is used. In the code above, for example, the default value "HOME" (landline number at home) is entered for the telephone number type.
repeated: fields with the “repeated” modifier can be repeated any number of times (including zero times).

You can find detailed instructions on how to define your own data format with Protocol Buffers in the Google Developer Forum.

Compile your own Protocol Buffers schema

If your own data structures are defined as desired in the .proto file, generate the classes needed to read and write the Protobuf messages. To do this, use the Protocol Buffers Compiler (protoc) on the configuration file. If you have not yet installed it, simply download the current version from the official GitHub-Repository. Unzip the ZIP file at the desired location and then start the compiler with a double click (located in the "bin" folder).

Note

Make sure you have the appropriate edition the Protobuf compiler: Protoc is available for 32- or 64-bit architectures (Windows, Linux or macOS), as desired.

Finally, you specify:

the source directory which contains the code of your program (here placeholder "SRC_DIR"),
the destination directory in which the generated code is to be stored (here placeholder "DST_DIR")
and the path to the .proto file.

As you want to generate Java classes, you also use the --java_out option (similar options are also available for the other supported languages). The complete compile command is as follows:

protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto

Tip

A more detailed Protobuf Java tutorial, which explains, among other things, the transmission of messages via Protocol Buffers (read/write), is offered by Google in the “Developers” section, the in-house project area of the search engine giant for developers. Alternatively, you also have access there to instructions for the other supported languages such as C++, Go or Python.

Was this article helpful?

REDPIXEL.PLShutterstock

Web programming languages: the best languages for web development

So, you are looking to learn one or more web programming languages? The first challenge is to choose one. After all, there are hundreds of programming languages, but not all of them are good for web development. To make this decision easier for you and help you tackle your web…

PHP
JavaScript

SDK: What Exactly is a Software Development Kit?

Anyone who develops software must keep a lot in mind. Good usability is just as important as the functionality of an application. At the same time, optimal performance is crucial – not so program errors. Finally, the product also needs to run well on the intended target platforms…

RDVectorShutterstock

BLOBs (Binary Large Objects)

BLOB is short for Binary Large Object. The data types in a BLOB are typically classified as being unstructured data. A typical example of an unstructured data type is multimedia files, which are usually stored in databases as a BLOB. Since databases cannot read the unstructured…

Big Data

rangizzzshutterstock

What are CLOBs (Character Large Objects)?

Databases normally save information in database blocks. The size of the data plays a key role here: Especially large data objects that exclusively comprise strings are saved as CLOBs (Character Large Objects) or TEXT and are usually stored with a reference. Read the following…

spainter_vfxShutterstock

Reverse engineering software

The process of “reverse engineering” allows you to understand the functionality and structure of software. By reverse engineering software down to the source code, it is not only possible to understand incorrect messages, but also to analyze competing programs. Reverse…