-
Notifications
You must be signed in to change notification settings - Fork 36
csv2rdf4lod automation data root
It is important to know the data root and Directory Conventions during the Conversion process phase: name and Conversion process phase: retrieve.
A "data root" is a directory that contains your data and organizes it according to the directory conventions. Although csv2rdf4lod-automation expects the data root to follow these conventions when it invokes any of its commands, the convention is very useful when sharing data with other people, since they will already be familiar with how things are organized!
Data roots can be anywhere in your file directories, as long as they are named source/
. For example, here are paths to three different data roots:
wherever_you_want/source/
some_other_location/source/
yet_another_project/data/source/
Data roots contain structures following the directory conventions, which follow the "source, dataset, version" pattern. The following paths show where data would be retrieved for datasets within different data roots (SS1
, DD1
, VV1
, etc. are generic names for your sources, datasets, and versions):
wherever_you_want/source/SS1/DD1/version/VV1/source/a.csv
some_other_location/source/SS1/DD2/version/VV1/source/b.csv
yet_another_project/data/source/SS2/DD1/version/VV1/source/c.csv
For example,
wherever_you_want/source/whitehouse-gov/visitor-records/version/1510/source/their.csv
[This page](List of SPARQL endpoints containing datasets produced by csv2rdf4lod) lists some real-world data roots that are being used for different projects.
At some point, you'll have data roots all over the place. We need a good way to quickly find all of them.