This page is located in archive. Go to the latest version of this course pages. Go the latest version of this page.

Table of Contents


Create semi-automated data pipeline, based on the data sources from checkpoint 0.


  • data pipeline (Scrapy, SPARQL, etc.) for reconstructing the data set;
  • the data set;
  • UML diagrams of the data set schema


  • the data pipeline should transform data-sources into RDF (i.e. target data set)
  • choose any tools you like (e.g. any programming language you are familiar with) to create the data pipeline. However, for the most of the cases the following two alternatives should be sufficient to use:
    • GraphDB (OpenRefine+SPARQL) for processing CSV files, triplifying them, and manipulating resulting RDF
    • Scrapy + GraphDB (OpenRefine+SPARQL) for scraping web pages, triplifying them, and manipulating the resulting RDF
  • the resulting data set should contain all relevant data for the integration task unified in format (RDF)
courses/osw/cp1.txt · Last modified: 2017/11/22 23:02 by blaskmir