Table of Contents

Semestral Work

The goal of the semestral work is to learn how to combine data, information and knowledge and use the knowledge to answer non-trivial question over set of non-interlinked datasets.

Grading

The semestral work is graded in three checkpoints:

To successfully complete the semestral project, you need to obtain at least 50% grading from each checkpoint. For completing a checkpoint you need to

  1. (by the checkpoint deadline ) push the deliverable from each checkpoint to the GIT repo https://gitlab.fel.cvut.cz/B231_B4M36OSW/%username% ,
  2. (by the checkpoint deadline ) upload a txt file 'cp1.txt', resp. 'cp2.txt' or 'cp3.txt' with the hash of the submitting commit from gitlab into the upload system,
  3. prepare 5 mins presentation and defend it (at the next lab).

Deadline to deliver checkpoint is the last Sunday before the tutorial on which it is defended. If you submit your checkpoint after the deadline, you will be penalized by losing 5 points for each commenced week of delay. This penalization will be used for final grading only (i.e. it will not be taken into account for passing/failing determination).

Description

The basic goal of the semestral project is to Combine data, information, and knowledge to answer non-trivial questions. Students may choose from the questions mentioned in Checkpoint 1, or create their own questions. Look into the projects from the last semester at the bottom of the page for inspiration. There is also a sample project describing how expected outputs shall look like.

During the tutorials, we will go through the steps needed to successfully finish semestral work, on the sample topic.

The sample topic is Protected sites with the question “In which protected areas is it allowed to move outside of marked path in a certain way (MTB, hiking, ski touring, etc.)”.

The expected output is not only the answer, but a functional framework to get answers automatically (e.g. easily change the given movement, etc.). Note that questions are not specified to a temporal or spatial extent. In the first checkpoint, based on the data sources, define the question specifically with spatial and temporal definitions. It may vary a lot – e.g. comparison of protected areas overlap among two neighboring countries, or changes in legislation for protected areas during the past century.

Semestral work step by step

Semestral work consists of three checkpoints. Every checkpoint comes directly out of the materials covered in tutorials. Every checkpoint must contain a document describing the working process and outputs.

Create conceptual model

First checkpoint takes place after the first two tutorials and its goal is to define the problem and create a conceptual model of knowledge needed to solve it. An essential part of the task is to unambiguously define all the terms used in the conceptual model. Based on the nature of the problem, go through legislative and technical documents, and scientific and popular papers and define concepts in the specific meaning. Based on the model, look for the available data sets covering the problem – this is crucial to prove the question is answerable. Then create a model schema of datasets – with objects, their attributes, and relations between objects. Try to interconnect concepts of data sets to the conceptual model.

The expected output of the first checkpoint is:

More details about the checkpoint are in the Checkpoint 1 specification page

Formalizing models and data

Second checkpoint is the most complex one and done right, the third checkpoint is, i.e. answering the question, shall be really easy. It consists of two parts – formalization of models as ontologies and the transformation of data into the RDF format corresponding to the formalized models. In the first part, you will create the domain ontology of the conceptual model and a set of ontologies describing dataset models. At this phase, we recommend you create ontologies as separate files. The expected output is:

The second part of the checkpoint is to set up a pipeline for the transformation of source data into RDF and then, naturally, run it. The expected output of this part of the checkpoint is:

A detailed description of the checkpoint is in its specification page.

Integration of models and data

Third checkpoint combines and integrates the data through the integration of models. Finally, you shall answer the question defined in the first checkpoint.

It requires two steps to be done in order to answer the question:

Detailed descriptions can be found in the specification page of the last checkpoint.

Generic Ontologies

Generic Dataset Sources

Environmental, statistical and health Datasets and Applications

In case of not finding data you would need, contact lecturer.

Sample projects

Sample project created for students for a better understanding of what to do in the specific Checkpoints

Archive semestral-work-sample.zip.

Projects done by students in previous years
Do not use them as templates for your semestral work since the rules from previous years were different.
2022

Documentation semestral-work-example-2022-1.pdf and semestral-work-example-2022-2.pdf.

2021

Documentation semestral-work-example-2021.pdf and semestral-work-example-2021-2.pdf.

Semestral project changed for winter semester 2021/22. Therefore sample projects will not help you much.
2019

Archive semestral-work-example-2019.zip.

2018

Archive semestral-work-example-2018.zip.

2017

Archives semestral-work-example-2017-1.zip and semestral-work-example-2017-2.zip.