Create a semi-automated data pipeline, which transforms the data sources from Checkpoint 0 into RDF. Each data source will keep its own, separate data schema.
Deliverable
source code for the data pipeline (Scrapy, SPARQL, s-pipes, OpenRefine, etc.) for creating RDF datasets out of data sources
the RDF datasets (outputs of the data pipeline) obtained from the data sources defined in Checkpoint 0
a short description (1-2 page extension of the report from Checkpoint 0) describing the data pipeline, its limitations and benefits, together with a UML class diagram depicting a schema for each dataset.
Details
the data pipeline should extract RDF dataset out of each data source
choose any tools you like (e.g. any programming language you are familiar with) to create the data pipeline. However, for most of the cases the following two alternatives should be sufficient to use:
GraphDB (OpenRefine+SPARQL) for processing CSV files, triplifying them, and manipulating the resulting RDF
Scrapy + GraphDB (OpenRefine+SPARQL) for scraping web pages, triplifying them, and manipulating the resulting RDF
the resulting RDF datasets should contain all relevant data for the integration task in Checkpoint 2