Warning
This page is located in archive. Go to the latest version of this course pages.

Serialization

Task Assignment

In many cases, you will need to transfer data between processes running on the same machine or send the data via the network. You can use XML or JSON but these formats are inefficient. There are several alternatives (e.g. Protobuf and Avro) you can use instead.

The serialization frameworks define data structures with schemas in their own languages that can be compiled into classes in a variety of programming languages (Java, C++, Python, etc.).

Your goal is to implement an application (Java) which receives data and stores them to classes generated by protobuf and avro serialization frameworks and sends the converted data via TCP to an application (C/C++). This application processes the data (calculates averages) and sends results back.

You will be given a reference implementation using JSON data format. Use this data classes as a template how the schemas in protobuf and avro should look like. You will not be probably able to create exact counterparts due to some limitations of the serialization protocols; therefore, you will need to find out a workaround.

Steps:

  1. Create a team of 2 students
  2. Download a template from git repository:
    git clone https://gitlab.fel.cvut.cz/cuchymar/serialization-2018-template.git
  3. There is a subfolder in src/main/ for each language/schema (java,cpp,proto,avro,..)
  4. Compile and run (including installing required libraries) the C/C++ counterpart by:
    sudo apt-get install libboost-all-dev libjsoncpp-dev
    cd src/main/cpp/
    mkdir build
    cd build
    cmake ..
    make
    ./app 12345 json
    or use any IDE.
  5. Import into a Java IDE as Maven project (more information about Maven bellow) and to make the java part compilable, run:
     mvn compile 
  6. To see how it works for JSON run AppTest.java
  7. Define protobuf and avro schemas as similar as possible to the provided JSON format (package cz.esw.serialization.json.*). Write the schemas into prepared measurements.proto and measurements.avsc files. Hint: Use class names with prefix P for protobuf classes (e.g. PDataset) and A for avro classes (e.g. ADataset).
  8. Implement the applications (both Java and C/C++) into the provided template with specification described in the next section.
  9. Observe performance differences between the data formats.
  10. Commit solution to your tutor and also into the upload system. Upload only pom.xml,readme.txt and src/ folder without any compiled binaries or generated sources.

Application Specification - Java

The configuration of the application is handled by Maven (pom.xml) which takes care of all libraries required and compilation of the serialization schemas (you have to run mvn compile to generate the source codes of the data classes every time you change the serialization schemas).

The application can be compiled and run from the command line with following commands in the project folder:

mvn compile
mvn exec:java -Dexec.mainClass="cz.esw.serialization.App" -Dexec.args="argument1 argument2"

The java app has to accept following three arguments:

app <host> <port> <format> 

The application has to accept generated data and convert it to the transfer format and send the data.

The arguments <host> and <port> are the address and port of the receiver and <format> is one of the following enumeration {json, proto, avro} defining the format for the data transfer over TCP.

Application Specification - C/C++

A contrary to Java with Maven, where everything is done automatically, in C/C++ we have to compile the schemas manually and add the generated files to the CMakeLists.txt. Or an experienced user of CMake can enhance the build script to do it automatically like Maven. Links with description how to install the protocol compilers and use them are provided below in corresponding sections.

The C/C++ application has to listen on the defined port and receive data in the defined format, process the data (just calculate averages) and send the results back.

The app has to accept following two arguments:

app <port> <format>

The argument <port> is the port on which the receiver listens and <format> is one of the following enumeration {json, protobuf, avro} defining the format of the data transferred over TCP.

Data format

  • json - sends/receives the data as JSON text
  • proto - sends/receives the data as bytes of the protobuf generated classes
  • avro - sends/receives the data as bytes of the avro generated clases

Message Size

The C++ implementations of Protobuf and Avro frameworks will not probably be able to recognize ends of messages, therefore the Java application has to send the message size before the message itself.

The recieving part should look similar to:

int messageSize = readAndDecodeMessageSize(stream) // your implementation 
char *buffer = new char[messageSize];
stream.read(buffer, messageSize)
...

The size of Protobuf message is easy to get:

int messageSize = objectToBeSerialized.getSerializedSize();
sendMessageSize(messageSize, outputStream) // your implementation
...

The size of Avro message is not that straightforward to retrieve:

DatumWriter<ADataset> datumWriter = new SpecificDatumWriter<ADataset>(ADataset.class)
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(byteArrayOutputStream , null);
datumWriter .write(objectToBeSerialized, encoder);
encoder.flush();

int messageSize = byteArrayOutputStream.size();
sendMessageSize(messageSize, outputStream) // your implementation
...

Protobuf

Avro

Maven

Apache Maven is a project management tool enabling management of library dependencies and building. Some IDEs have Maven integrated but for easy use in command line you will have to download it and add the bin folder to the PATH.

Bonus Task

There is an option to receive two bonus points:

  • 1 point for implementation of Cap’n Proto protocol in addition.
  • 1 point for code that does not send the data via TCP and uses shared memory instead.
courses/b4m36esw/labs/lab07.txt · Last modified: 2020/04/06 14:24 by cuchymar