courses:osw:cp1 [CourseWare Wiki]

Checkpoint 1 (max 10 pts)

Goal

The goal of this checkpoint is to find the research question to be answered, build a custom conceptual model to answer the question and find relevant data sources that could help answer the question.

The first two tutorials show the introduction to sample problems and introduce the way of thinking about data description as concepts, that may be filled from various sources. Tasks in the first checkpoint focus mainly only on those concepts.

Based on the offered topics and sample questions formulate your own question, or formulate your own question on your own topic.

Topics

For the winter semester 2025/26 we offer three topics, from which one is newly created for this semester.

The topis Protected sites more or less follows the samples provided in the tutorials. It is based on the datasets describing protected sites on the various level of protection and on the various legislation basis – local, national or international. The goal is to create a conceptual model describing desired topic (limitations, development, objects of protection) related to the legislation standing behind the protected sites and applying it on specific data from various sources described by various models. This topic offers following questions:

In which areas can I move outside of the marked paths (e.g. mountain biking, ski touring, etc.)?
What are the protected areas and their levels (monuments, nature) for specific municipalities? E.g. in a given radius or directly in the area of a municipality?
What is the overlap among protected sites on various levels of administrative units (Natura 2000, bird habitation areas, national parks, and other protected areas)?
How are the conditions for development regulated by the various levels of protected areas (cultural or natural)?
What is the relation between the level of protection and tourism?

The topic Animal protection is based on the reuse of perfect hierarchical dataset - animal taxonomy. The desired interconnection may include animal rescue station, zoological gardens, wild fauna findings and interconnection of the data based on the various levels in the hierarchical model. Conceptual model here shall describe animal shelters purpose and structure, possibly its meaning by the law, the nature of animal findings and interconnect it to the taxonomical structure. For the topic we propose the following questions:

What is the nearest animal rescue station taking care of specific species?
Which animals living in the area are not taken care of by any animal rescue station?
Which species population had the biggest growth/loss in protected areas aggregated across municipalities/regions?
Which animal species are more endangered in municipal areas and which in wildlife (e.g. by the combination of population and number of animals taken care of in animal shelters)?

The last, newly created topic Municipality forms focuses on the fact that municipalities shall collect according to the law, but there is no regulation on how to do it. As an example see the registration forms on dogs in various municipalities – most municipalities have some kind of non-structurized paper forms. Knowing that those information shall be part of Register of Rights and Obligations, the conceptual model shall describe the collected information in the general way and its relation to the law enforcing it according to the register. This topis was created thanks to the publication of Collection of laws in the form of Open Data from the January 2024. Question to be asked are e.g.:

What are the differences in registered pets population in various municipalities?
What information is shared based on the Registry of rights and obligations (RPP)?
What is the relation between dog breeds and the municipality characteristics (location, equipment, size) …

Of course, you may come with own topic. According to the recent events, you may examine relations between flood regions and areas covered by flood protections in various administrative units etc…

Deliverable

Deliver a PDF file consisting of 1-2 pages describing

research question to answer with detailed specification of topic, question, and your motivation for the topic,
E-R, UML, or other diagram describing the knowledge needed to answer the question, containing classes, their properties, and relations between them,
list of datasets to be used, including source, description, and model schemas in the form of UML or E-R diagram, showing connections to the conceptual model.

On the next tutorial, everyone will tell the others the question and topic and list of available datasets (2 mins max).

Details

Research question

Choose a topic and a research question from the list provided or come up with your own question. Consult the question with the lecturer – all proposed questions have some solution and require integration of datasets from various sources in order to answer it. Questions created by students shall be equally complicated to answer and the solution shall exist. Specify the questions for specific space and time.

Conceptual model

The conceptual model represents a visualization of the knowledge needed to answer the question. An essential part of the task is to unambiguously define all the terms used in the conceptual model. Based on the nature of the problem, go through legislative and/or technical documents, and scientific and popular papers and define concepts in a specific meaning. An example question from the first tutorial – in which areas it is possible to legally sleep overnight in nature – is modeled in the following way:

what is sleeping overnight in nature,
what kind of areas do we know,
what are the restrictions in those areas?

Look into the legislation (national, European), if it handles the problem somehow. Find definitions of related terms.

The outcome model shows the relations between the inputs and outputs and shows the way to answer the question. Feel free to use various colours and frame types to express abstract classes, different sources, etc. Use any conceptual language you know to represent the model, e.g. UML or E-R model.

Tutorial models were created in yED.

Data sources

Based on the model, look for the available data covering the problem. Sources of available data sets are listed in the Semestral Work page. Then create a model of datasets – with objects, their attributes, and relations between objects. Try to think about possible interconnections of concepts of datasets to the concepts in the conceptual model. It may happen that some of the datasets contain crucial information that does not correspond to the conceptual model. Feel free to update the model, but keep in mind, that the conceptual model and model of datasets are at two different levels of modeling (as seen in the picture, where thick lines represent the mapping between dataset models objects to the concept of conceptual model).

Describe datasets in the output document and include the standalone dataset schema diagrams. Describe in the text, how it corresponds to the classes in the conceptual model. Focus on the information needed to answer your question, i.e. interconnecting the data to get the specific knowledge. Superrich data are not useful for you without that one piece of information you need.

Some data are published only regionally. Try to look for other data providing the same or similar knowledge. It may be needed to combine more datasets to complete the knowledge over a larger area.

Table of Contents