====== Serialization ====== ===== Task Assignment ===== In many cases, you will need to transfer data between processes running on the same machine or send the data via network. You can use XML or JSON, but these formats are inefficient. You can use several alternatives (e.g., Protobuf and Avro) instead. The serialization frameworks define data structures with schemas in their own languages that can be compiled into classes in various programming languages (Java, C++, Python, etc.). Your goal is to implement a client application in Java that receives data, stores them to classes generated by Protobuf and Avro serialization frameworks, and sends the converted data via TCP to a server application implemented in C/C++ (or Python at your own risk). This application processes the data (calculates averages) and sends the results back. You will be given a reference implementation using JSON data format. Use these data classes as a template of how Protobuf and Avro schemas should look. You will probably not be able to create exact counterparts due to some limitations of the serialization protocols; therefore, you will need to find a workaround. Steps: - Create a team of 2 students - Download a template from git repository: git clone https://gitlab.fel.cvut.cz/esw/serialization.git - There is a subfolder in src/main/ for each language/schema (java,cpp,proto,avro,..) - Compile and run (including installing required libraries) the C/C++ counterpart by: sudo apt-get install libboost-all-dev libjsoncpp-dev cd src/main/cpp/ mkdir build cd build cmake .. make ./server 12345 json or use any IDE. - Import into a Java IDE as Maven project (more information about Maven bellow) and to make the java part compilable, run: mvn compile - To see how it works for JSON, run ''AppTest.java'' - Define ''protobuf'' and ''avro'' schemas as similar as possible to the provided JSON format (''package cz.esw.serialization.json.*''). Write the schemas into prepared ''measurements.proto'' and ''measurements.avsc'' files (or ''.avdl'' file - more details in Avro section below). Recommendation: Use class names with prefix ''P'' for ''protobuf'' classes (e.g., ''PDataset'') and ''A'' for ''avro'' classes (e.g., ''ADataset''). - Implement the applications (both client and server) into the provided template with specification described in the next section. - Observe performance differences between the data formats. - Commit a solution to your tutor and also into the upload system. Upload only ''pom.xml'',''readme.txt'' and ''src/'' folder without any compiled binaries or generated sources. ==== Client Application Specification - Java ==== The configuration of the application is handled by Maven (''pom.xml'') which takes care of all libraries required and compilation of the serialization schemas (you have to run ''mvn compile'' to generate the source codes of the data classes every time you change the serialization schemas). The application can be compiled and run from the command line with following commands in the project folder: mvn compile mvn exec:java -Dexec.mainClass="cz.esw.serialization.App" -Dexec.args="localhost 12345 json" The java app has to accept following three arguments: app The application has to accept generated data and convert it to the transfer format and send the data. The arguments '''' and '''' are the address and port of the receiver and '''' is one of the following enumeration ''{json, proto, avro}'' defining the format for the data transfer over TCP. ==== Server Application Specification - C/C++ ==== Contrary to Java with Maven, where everything is done automatically, in C/C++ we have to compile the schemas manually and add the generated files to the CMakeLists.txt. Or an experienced user of CMake can enhance the build script to do it automatically like Maven. Links with descriptions of how to install the protocol compilers and how to use them are provided in the corresponding sections below. The C/C++ application has to listen on the defined port and receive data in the defined format, process the data (just calculate averages) and send the results back. The app has to accept following two arguments: server The argument '''' is the port on which the receiver listens and '''' is one of the following enumeration ''{json, protobuf, avro}'' defining the format of the data transferred over TCP. ==== Server Application Specification - Python ==== Instead of using the provided C++ template, you can develop your own server part of the application in Python. However, be aware that you do it at your own risk. Both protocols should support Python, but we did NOT try to implement this task in Python. The application has to follow the same specifications as the C++ version. ==== Readme ==== The readme file has to contain all necessary steps to compile the code (including the serialization protocols) and how to run the application. It also needs to contain any additional dependencies. In short, we must be able to compile and run the application based solely on the readme instructions. ==== Data format ==== * ''json'' - sends/receives the data as JSON text * ''proto'' - sends/receives the data as bytes of the ''protobuf'' generated classes * ''avro'' - sends/receives the data as bytes of the ''avro'' generated clases ==== Message Size ==== The implementations of Protobuf and Avro frameworks will not probably be able to recognize ends of messages, therefore the application has to send the message size before the message itself. The recieving C part should look similar to: int messageSize = readAndDecodeMessageSize(stream) // your implementation char *buffer = new char[messageSize]; stream.read(buffer, messageSize) ... The size of Protobuf message is easy to get: int messageSize = objectToBeSerialized.getSerializedSize(); sendMessageSize(messageSize, outputStream) // your implementation ... The size of Avro message is not that straightforward to retrieve: DatumWriter datumWriter = new SpecificDatumWriter(ADataset.class) ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(byteArrayOutputStream , null); datumWriter .write(objectToBeSerialized, encoder); encoder.flush(); int messageSize = byteArrayOutputStream.size(); sendMessageSize(messageSize, outputStream) // your implementation ... ==== Protobuf ==== * [[https://github.com/google/protobuf/blob/master/src/README.md|Installation Guide]] * [[https://developers.google.com/protocol-buffers/docs/proto3|Proto Language Guide]] * [[https://developers.google.com/protocol-buffers/docs/javatutorial|Java Protobuf Basics]] * [[https://developers.google.com/protocol-buffers/docs/cpptutorial| CPP Protobuf Basics]] * [[https://developers.google.com/protocol-buffers/docs/pythontutorial| Python Protobuf Basics]] * [[https://developers.google.com/protocol-buffers/docs/reference/java-generated|Java Protobuf Documentation]] ==== Avro ==== * [[https://avro.apache.org/docs/current/api/cpp/html/index.html|Avro CPP Getting Started and Installation Guide]] * [[http://www.apache.org/dyn/closer.cgi/avro/|Avro CPP Download]] * [[https://avro.apache.org/docs/current/api/cpp/html/namespaceavro.html|Avro CPP documentation]] * [[https://avro.apache.org/docs/current/gettingstartedjava.html|Avro Java Getting Started]] * [[https://avro.apache.org/docs/current/gettingstartedpython.html|Avro Python Getting Started]] * [[https://avro.apache.org/docs/current/spec.html| Avro Schema Specification]] * [[https://avro.apache.org/docs/current/idl.html| Avro IDL Specification]] You can use either the JSON-based Avro Schema or much less verbose Avro IDL to define the messages. However, be aware that the ''avrogencpp'' tool accepts only Avro Schema. Therefore, you need to convert IDL to Schema by ''avro-tools.jar'' (you can download it on the same site as other Avro parts), or, for example, IntelliJ IDEA Avro plugin can also do the conversion. We are not aware of what format is supported by Python. ==== Maven ==== [[https://maven.apache.org/|Apache Maven]] is a project management tool enabling management of library dependencies and building. Some IDEs have Maven integrated but for easy use in command line you will have to [[https://maven.apache.org/download.cgi|download]] it and add the ''bin'' folder to the ''PATH''. ==== Bonus Task ==== There is an option to receive two bonus points: * 2 points for code that does not send the data via TCP and uses shared memory instead.