====== Semestral Work ====== The goal of the semestral work is to learn how to combine data, information and knowledge and use the knowledge to answer non-trivial question over set of non-interlinked datasets. ===== Grading ===== The semestral work is graded in three checkpoints: * [[courses:b4m36osw:cp1|checkpoint 1]] (max 10 pts) -- deadline to deliver **9. 10. 2022** * [[courses:b4m36osw:cp2|checkpoint 2]] (max 25 pts) -- deadline to deliver **20. 11. 2022** * [[courses:b4m36osw:cp3|checkpoint 3]] (max 15 pts) -- deadline to deliver **8. 1. 2023** To successfully complete the semestral project, you need to obtain at least 50% grading from **each** checkpoint. For completing a checkpoint you need to - (**by the checkpoint [[courses:b4m36osw:seminars|deadline]]**) push the deliverable from each checkpoint to the GIT repo https://gitlab.fel.cvut.cz/B221_B4M36OSW/%username% , - (**by the checkpoint [[courses:b4m36osw:seminars|deadline]]**) upload a txt file 'cp1.txt', resp. 'cp2.txt' or 'cp3.txt' with the hash of the submitting commit from gitlab into the [[https://cw.felk.cvut.cz/brute/teacher/course/1352|upload system]], - prepare 5 mins presentation and defend it (**at the [[courses:b4m36osw:seminars|next lab]]**). Deadline to deliver checkpoint is the last Sunday before the tutorial on which it is defended. If you submit your checkpoint after the deadline, you will be penalized by losing 5 points for each commenced week of delay. This penalization will be used for final grading only (i.e. it will not be taken into account for passing/failing determination). ===== Description ===== The basic goal of the semestral project is to **Combine data, information and knowledge to answer non-trivial questions**. Students may choose from the following questions, or create own question. For inspiration, look into the projects from the last semester at the bottom of the page. A sample case of the semestral topic is **Protected sites** with the question "In which protected areas is it allowed to move outside of marked path in a certain way (MTB, hiking, skitouring etc...)". Expected output is not only the answer, but a functional framework to get answers automatically (e.g. easily change the given municipality in the first question etc). Note that questions are not specified in a temporal or spatial extent. In the first checkpoint, based on the data sources, define the question specifically with spatial and temporal definition. It may vary a lot -- e.g. comparison of protected areas overlap among two neighboring countries, or changes in legislation for protected areas during past century. ===== Semestral work step by step ===== Semestral work consists of three checkpoints. Every checkpoint comes directly out of the materials covered in tutorials. Every checkpoint **must contain** a document describing the working process and outputs. ==== Create conceptual model ==== **First checkpoint** takes place after first two tutorials and its goal is to define the problem and create a conceptual model of knowledge needed to solve it. Essential part of the task is to unambiguously define all the terms used in the conceptual model. Based on the nature of the problem, go through legislative and/or technical documents, scientific papers and define concepts in a specific meaning. Based on the model, look for the available data covering the problem. Then create a model of datasets -- with objects, their attributes and relations between objects. Try to interconnect concepts of data sets to the conceptual model. Expected output of the first checkpoint is: * definition of the research question and the environment in which we seek the answers, * E-R or UML or other diagram showing the knowledge needed to get the answer, * list of datasets to be used, including source, description and model in a form of UML or E-R diagram, showing connections to the conceptual model. More details about the checkpoint are in the [[ :courses:b4m36osw:cp1 | Checkpoint 1 specification page ]] ==== Formalizing models and data ==== ** Second checkpoint** is the most complex one and done right, third checkpoint is only answering the question. It consists of two parts -- formalization of models as ontologies and transformation of data into the RDF format corresponding to the formalized models. In the first part you will create domain ontology of the conceptual model and set of ontologies describing dataset models. The expected output is: * ontology of conceptual model, containing unambiguously defined concepts (with definitions, sources etc., using SKOS, RDF(S) or other high level ontology) and interconnecting them, * ontologies of dataset models, describing content of specific data set. Deliver one ontology per used data set. Other part of the checkpoint is to set up a pipeline for transformation of source data into RDF and then, naturally, run it. The expected output of this part of second checkpoint is: * RDFization pipeline (program, OpenRefine configuration files, SPARQL queries etc.) including dummy-proof tutorial how to run it, * RDFized datasets (if it is too big, upload the data to GraphDB and include repository name and contexts) corresponding to the dataset models. Detailed description of the checkpoint is in its [[cp2 | specification page]]. ==== Integration of models and data ==== ** Third checkpoint** combines and integrates the data through the integration of models. Finally, you shall answer the question defined in the first checkpoint. It requires two steps to be done in order to answer the question: * creation of mapping between formalized dataset models and conceptual model, using UFO, OWL, maybe SHACL rules to validate inputs etc. * designing a non-trivial SPARQL queries, proving the capabilities of integration of data sets by answering the question. Detailed description can be found in the [[cp3 | specification page of the last checkpoint]]. ===== Some Related Data Sources ===== ==== Generic Ontologies ==== * Protégé Ontology Library - http://protegewiki.stanford.edu/wiki/Protege_Ontology_Library * dbpedia.org * Time Ontologies * Time Ontology -- https://www.w3.org/TR/owl-time * Spatial Ontologies * WGS84 Ontology -- https://www.w3.org/2003/01/geo * GeoSparql Ontology -- http://www.opengeospatial.org/standards/geosparql ==== Generic Dataset Sources ==== * DataHub - http://datahub.io * European Data Portal - https://www.europeandataportal.eu/en/homepage * NKOD - https://data.gov.cz/datov%C3%A9-sady * Prague Open Data - http://opendata.praha.eu/ * Brno Open Data - https://kod.brno.cz/ * Ostrava Open Data - https://opendata.ostrava.cz/ * Pilsen Open Data - https://opendata.plzen.eu/ ==== Environmental, statistical and health Datasets and Applications ==== * NOAA Climate Data Online -- https://www.ncdc.noaa.gov/cdo-web/datasets * Národní katalog otevřených dat (NKOD) -- https://opendata.gov.cz/nastroj:narodni-katalog-otevrenych-dat * European Environmental Agency -- https://www.eea.europa.eu/data-and-maps/data * European Union Open Data Portal -- https://data.europa.eu/euodp/en/data/ * European Climate Assessment and Dataset -- https://www.ecad.eu/dailydata/index.php * World Health Organization -- https://www.who.int/gho/database/en/ * Some useful economical data for UK -- https://www.economicsnetwork.ac.uk/data_sets * Economical data of Czech Statistical Office -- https://www.czso.cz/csu/czso/aktualniinformace * and many more... In case of not finding data you would need, contact lecturer. ===== Sample projects from previous years ===== Semestral project changed for winter semester 2021/22. Therefore sample projects will not help you much. == 2017 == Archives {{ :courses:osw:osw2017-semestral-work-example-1.zip | semestral-work-example-2017-1.zip}} and {{ :courses:osw:osw2017-semestral-work-example-2.zip | semestral-work-example-2017-2.zip}}. == 2018 == Archive {{ :courses:osw:osw2018-semestral-work-example.zip | semestral-work-example-2018.zip}}. == 2019 == Archive {{ :courses:osw:osw2019-semestral-work-example.zip | semestral-work-example-2019.zip}}. Do not use them as templates for your semestral work since the rules from previous years were different. == 2021 == Archives {{ :courses/b4m36osw/semestral-work-example-2021.pdf | semestral-work-example-2021.pdf}} and {{ :courses/b4m36osw/semestral-work-example-2021-2.pdf | semestral-work-example-2021-2.pdf}}.