-
Notifications
You must be signed in to change notification settings - Fork 36
Conversion trigger
Conversion trigger is a metaphor for the shell script created once and executed one or more times to invoke the conversion process. The conversion trigger sits within a conversion cockpit, a directory dedicated to converting data files retrieved from a particular source organization. Once created, the trigger is used to create verbatim conversions and enhancement 1 conversions automatically. Creating additional enhancement conversions can be done with the parameter -e
, e.g. ./convert-mydataset.sh -e 2
.
This section outlines the files that are used to produce a conversion.
If you just don't care about saving time, and want "the latest conversion" to just regenerate, use the --force
argument to the conversion trigger:
./convert-DATASET.sh --force
For example,
bash-3.2$ ./convert-cell-based-range.sh
--------------------------------------------------------------------------------
test.csv
E1 output automatic/test.csv.e1.ttl newer than enhancement parameters manual/test.csv.e1.params.ttl; skipping.
convert.sh done
convert-aggregate.sh not publishing b/c $CSV2RDF4LOD_PUBLISH=false.
===========================================================================================
The above avoids redundant processing, which you can force by adding the --force
argument:
bash-3.2$ ./convert-cell-based-range.sh --force
--------------------------------------------------------------------------------
test.csv
E1 output automatic/test.csv.e1.ttl is newer than enhancement parameters manual/test.csv.e1.params.ttl, but you --force'd so we'll do it anyway.
2 rows in manual/test.csv
convert-cell-based-range.sh overriding conversion:base_uri in parameters file with http://localhost
E1 CONVERSION
========== edu.rpi.tw.data.csv.CSVtoRDF version 2013-June-22 initiated:
fileName: manual/test.csv
...
Generated 48 triples in 0 min. ( 25945.9 triples/min )
...
INFO: writing metadata to separate file from data automatic/test.csv.e1.void.ttl
========== edu.rpi.tw.data.csv.CSVtoRDF complete. ==========
...
To save time, the conversion trigger avoids rerunning conversions that already exist.
So, if you already have the verbatim conversion at automatic/*.raw.ttl
, remove those files and rerun the conversion trigger:
rm automatic/*.raw.ttl
./convert-DATASET.sh
- The
surrogate
variable is an old name for base_uri. It was originally named surrogate because csv2rdf4lod is designed to aggregate data from third party source organizations and republish RDF representations to third party consumers. base_uri is the web host that csv2rdf4lod users will provide their data, which acts as a surrogate for the third party source organizations that have not adopted linked data publishing principles. The values of base_uri are drawn from the shell environment variableCSV2RDF4LOD_BASE_URI
. - The environment variables
subjectDiscriminator
,commentCharacter
,cellDelimiter``header``dataStart
,repeatAboveIfEmptyCol
,onlyIfCol
,interpretAsNull
, anddataEnd
reflect a limited subset of the RDF-encoded enhancement parameters. THESE ARE ONLY FOR BOOTSTRAPPING. They are only used whenmanual/*.e1.params.ttl
do not exist, at which point these variables are used to populate the RDF-encoded enhancement parameters. If you want to feed information about subject discriminators, comment characters, cell delimiters, etc., you can put them in the conversion trigger, but it is highly recommended to specify them within themanual/*.e1.params.ttl
files directly.
- Conversion process phase: create conversion trigger
- Conversion process phase: pull conversion trigger
- Conversion cockpit - the directory location of the conversion trigger.
-
$CSV2RDF4LOD_HOME/
bin/cr-create-convert-sh.sh - a script that creates a conversion trigger when given a list of files to convert.