Semestral Work

The goal of the semestral work is to learn how to combine data, information and knowledge and use the knowledge to answer non-trivial question over set of (previously) non-interlinked datasets.

Grading

The semestral work is graded in three checkpoints:

  • checkpoint 1 (max 10 pts) – deadline to deliver 6. 10. 2024
  • checkpoint 2 (max 25 pts) – deadline to deliver 10. 11. 2024
  • checkpoint 3 (max 15 pts) – deadline to deliver 5. 1. 2025

To successfully complete the semestral project, you need to obtain at least 50% grading from each checkpoint. For completing a checkpoint you need to

  1. (by the checkpoint deadline ) push the deliverable from each checkpoint to the GIT repo https://gitlab.fel.cvut.cz/B241_B4M36OSW/%username% ,
  2. (by the checkpoint deadline ) upload a txt file 'cp1.txt', resp. 'cp2.txt' or 'cp3.txt' with the hash of the submitting commit from gitlab into the upload system,
  3. prepare 5 mins presentation (just 2 mins for CP1) and defend it (at the next lab).

Deadline to deliver checkpoint is the last Sunday before the tutorial on which it is defended. If you submit your checkpoint after the deadline, you will be penalized by losing 5 points for each commenced week of delay. This penalization will be used for final grading only (i.e. it will not be taken into account for passing/failing determination).

Description

The basic goal of the semestral project is to Combine data, information, and knowledge to answer non-trivial questions. Students may choose from the questions mentioned in Checkpoint 1, or create their own questions. Look into the projects from the last semester at the bottom of the page for inspiration. There is also a sample project describing how expected outputs shall look like.

During the tutorials, we will go through the steps needed to successfully finish semestral work, on the sample topic.

The sample topic is Protected sites with the question “In which protected areas is it allowed to move outside of marked path in a certain way (MTB, hiking, ski touring, etc.)”.

The expected output is not only the answer, but a functional framework to get answers automatically (e.g. easily change the given movement, etc.). Note that questions are not specified to a temporal or spatial extent. In the first checkpoint, based on the data sources, define the question specifically with spatial and temporal definitions. It may vary a lot – e.g. comparison of protected areas overlap among two neighboring countries, or changes in legislation for protected areas during the past century.

Semestral work step by step

Semestral work consists of three checkpoints. Every checkpoint comes directly out of the materials covered in tutorials. Every checkpoint must contain a document describing the working process and outputs.

Create conceptual model

First checkpoint takes place after the first two tutorials and its goal is to define the problem and create a conceptual model of knowledge needed to solve it. An essential part of the task is to unambiguously define all the terms used in the conceptual model. Based on the nature of the problem, go through legislative and technical documents, and scientific and popular papers and define concepts in the specific meaning. Based on the model, look for the available data sets covering the problem – this is crucial to prove the question is answerable. Then create a model schema of datasets – with objects, their attributes, and relations between objects. Try to interconnect concepts of data sets to the conceptual model.

The expected output of the first checkpoint is:

  • definition of the research question and the environment in which we seek the answers,
  • E-R or UML or other diagram showing the knowledge needed to get the answer,
  • list of datasets to be used, including source, description and model in a form of UML or E-R diagram, showing connections to the conceptual model.

More details about the checkpoint are in the Checkpoint 1 specification page

Formalizing models and data

Second checkpoint is the most complex one. It consists of two parts – formalization of models as ontologies and the transformation of data into the RDF format corresponding to the formalized models. In the first part, you will create the domain ontology of the conceptual model and a set of ontologies describing dataset models. At this phase, we recommend you create ontologies as separate files. The expected output is:

  • ontology of conceptual model, containing unambiguously defined concepts (with definitions, sources, etc., using SKOS, RDF(S), or other high-level ontology) and interconnecting them,
  • ontologies of dataset models, describing the content of specific data sets. Deliver one ontology per data set.

The second part of the checkpoint is to set up a pipeline for the transformation of source data into RDF and then, naturally, run it. The expected output of this part of the checkpoint is:

  • RDFization pipeline (program, OpenRefine configuration files, SPARQL queries, RML etc.), including a dummy-proof tutorial on how to run it,
  • RDFized datasets corresponding to the dataset models (SPARQL queries ran on schemas shall return individuals from data).

A detailed description of the checkpoint is in its specification page.

Integration of models and data

Third checkpoint combines and integrates the data through the integration of models. Finally, you shall answer the question defined in the first checkpoint.

It requires two steps to be done in order to answer the question:

  • creation of mapping between formalized dataset models and conceptual model, using UFO, OWL, SHACL rules to validate inputs, etc.
  • designing non-trivial SPARQL queries, proving the capabilities of integration of data sets by answering the question and, finally, answer the question.

Detailed descriptions (including delivery) can be found in the specification page of the last checkpoint.

Generic Ontologies

Generic Dataset Sources

Environmental, statistical and health Datasets and Applications

In case of not finding data you would need, contact lecturer.

Sample projects

Sample project created for students for a better understanding of what to do in the specific Checkpoints
Projects done by students in previous years
Do not use them as templates for your semestral work since the rules from previous years were different.
2023
2022
2021

Documentation semestral-work-example-2021.pdf and semestral-work-example-2021-2.pdf.

Semestral project changed for winter semester 2021/22. Therefore sample projects will not help you much.
2019
2018
2017
courses/b4m36osw/semestral_work.txt · Last modified: 2024/09/16 11:05 by medmicha