Create semi-automated data pipeline, based on the data sources from checkpoint 0.
Deliverable
data pipeline (Scrapy, SPARQL, etc.) for reconstructing the data set;
the data set;
UML diagrams of the data set schema
Details
the data pipeline should transform data-sources into RDF (i.e. target data set)
choose any tools you like (e.g. any programming language you are familiar with) to create the data pipeline. However, for the most of the cases the following two alternatives should be sufficient to use:
GraphDB (OpenRefine+SPARQL) for processing CSV files, triplifying them, and manipulating resulting RDF
Scrapy + GraphDB (OpenRefine+SPARQL) for scraping web pages, triplifying them, and manipulating the resulting RDF
the resulting data set should contain all relevant data for the integration task unified in format (RDF)