Warning
This page is located in archive. Go to the latest version of this course pages. Go the latest version of this page.

Semestral Work

The goal of the semestral work is to learn how to combine data, information and knowledge and use the knowledge to answer non-trivial question over set of non-interlinked datasets.

Grading

The semestral work is graded in three checkpoints:

  • checkpoint 1 (max 10 pts) – deadline to deliver 9. 10. 2022
  • checkpoint 2 (max 25 pts) – deadline to deliver 20. 11. 2022
  • checkpoint 3 (max 15 pts) – deadline to deliver 8. 1. 2023

To successfully complete the semestral project, you need to obtain at least 50% grading from each checkpoint. For completing a checkpoint you need to

  1. (by the checkpoint deadline) push the deliverable from each checkpoint to the GIT repo https://gitlab.fel.cvut.cz/B221_B4M36OSW/%username% ,
  2. (by the checkpoint deadline) upload a txt file 'cp1.txt', resp. 'cp2.txt' or 'cp3.txt' with the hash of the submitting commit from gitlab into the upload system,
  3. prepare 5 mins presentation and defend it (at the next lab).

Deadline to deliver checkpoint is the last Sunday before the tutorial on which it is defended. If you submit your checkpoint after the deadline, you will be penalized by losing 5 points for each commenced week of delay. This penalization will be used for final grading only (i.e. it will not be taken into account for passing/failing determination).

Description

The basic goal of the semestral project is to Combine data, information and knowledge to answer non-trivial questions. Students may choose from the following questions, or create own question. For inspiration, look into the projects from the last semester at the bottom of the page.

A sample case of the semestral topic is Protected sites with the question “In which protected areas is it allowed to move outside of marked path in a certain way (MTB, hiking, skitouring etc…)”.

Expected output is not only the answer, but a functional framework to get answers automatically (e.g. easily change the given municipality in the first question etc). Note that questions are not specified in a temporal or spatial extent. In the first checkpoint, based on the data sources, define the question specifically with spatial and temporal definition. It may vary a lot – e.g. comparison of protected areas overlap among two neighboring countries, or changes in legislation for protected areas during past century.

Semestral work step by step

Semestral work consists of three checkpoints. Every checkpoint comes directly out of the materials covered in tutorials. Every checkpoint must contain a document describing the working process and outputs.

Create conceptual model

First checkpoint takes place after first two tutorials and its goal is to define the problem and create a conceptual model of knowledge needed to solve it. Essential part of the task is to unambiguously define all the terms used in the conceptual model. Based on the nature of the problem, go through legislative and/or technical documents, scientific papers and define concepts in a specific meaning. Based on the model, look for the available data covering the problem. Then create a model of datasets – with objects, their attributes and relations between objects. Try to interconnect concepts of data sets to the conceptual model.

Expected output of the first checkpoint is:

  • definition of the research question and the environment in which we seek the answers,
  • E-R or UML or other diagram showing the knowledge needed to get the answer,
  • list of datasets to be used, including source, description and model in a form of UML or E-R diagram, showing connections to the conceptual model.

More details about the checkpoint are in the Checkpoint 1 specification page

Formalizing models and data

Second checkpoint is the most complex one and done right, third checkpoint is only answering the question. It consists of two parts – formalization of models as ontologies and transformation of data into the RDF format corresponding to the formalized models. In the first part you will create domain ontology of the conceptual model and set of ontologies describing dataset models. The expected output is:

  • ontology of conceptual model, containing unambiguously defined concepts (with definitions, sources etc., using SKOS, RDF(S) or other high level ontology) and interconnecting them,
  • ontologies of dataset models, describing content of specific data set. Deliver one ontology per used data set.

Other part of the checkpoint is to set up a pipeline for transformation of source data into RDF and then, naturally, run it. The expected output of this part of second checkpoint is:

  • RDFization pipeline (program, OpenRefine configuration files, SPARQL queries etc.) including dummy-proof tutorial how to run it,
  • RDFized datasets (if it is too big, upload the data to GraphDB and include repository name and contexts) corresponding to the dataset models.

Detailed description of the checkpoint is in its specification page.

Integration of models and data

Third checkpoint combines and integrates the data through the integration of models. Finally, you shall answer the question defined in the first checkpoint.

It requires two steps to be done in order to answer the question:

  • creation of mapping between formalized dataset models and conceptual model, using UFO, OWL, maybe SHACL rules to validate inputs etc.
  • designing a non-trivial SPARQL queries, proving the capabilities of integration of data sets by answering the question.

Detailed description can be found in the specification page of the last checkpoint.

Generic Ontologies

Generic Dataset Sources

Environmental, statistical and health Datasets and Applications

In case of not finding data you would need, contact lecturer.

Sample projects from previous years

Semestral project changed for winter semester 2021/22. Therefore sample projects will not help you much.
2017
2018
2019

Archive semestral-work-example-2019.zip.

Do not use them as templates for your semestral work since the rules from previous years were different.
2021
courses/b4m36osw/semestral_work.txt · Last modified: 2022/09/12 10:41 by medmicha