Skip to content

csv2rdf4lod automation data root

Tim L edited this page Dec 19, 2013 · 24 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

What is first

It is important to know the data root and Directory Conventions during the Conversion process phase: name and Conversion process phase: retrieve.

What we'll cover

Let's get to it

A "data root" is a directory that contains your data and organizes it according to the directory conventions. Although csv2rdf4lod-automation expects the data root to follow these conventions when it invokes any of its commands, the convention is very useful when sharing data with other people, since they will already be familiar with how things are organized!

Data roots can be anywhere in your file directories, as long as they are named source/. For example, here are paths to three different data roots:

wherever_you_want/source/
some_other_location/source/
yet_another_project/data/source/

Data roots contain structures following the directory conventions, which follow the "source, dataset, version" pattern. The following paths show where data would be retrieved for datasets within different data roots (SS1, DD1, VV1, etc. are generic names for your sources, datasets, and versions):

       wherever_you_want/source/SS1/DD1/version/VV1/source/a.csv
     some_other_location/source/SS1/DD2/version/VV1/source/b.csv
yet_another_project/data/source/SS2/DD1/version/VV1/source/c.csv

For example,

wherever_you_want/source/whitehouse-gov/visitor-records/version/1510/source/their.csv

[This page](List of SPARQL endpoints containing datasets produced by csv2rdf4lod) lists some real-world data roots that are being used for different projects.

Data roots, everywhere!

At some point, you'll have data roots all over the place. We need a good way to quickly find all of them.

What is next

Clone this wiki locally