Skip to content

Conversion trigger

Tim L edited this page Jun 22, 2013 · 22 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

Conversion trigger is a metaphor for the shell script created once and executed one or more times to invoke the conversion process. The conversion trigger sits within a conversion cockpit, a directory dedicated to converting data files retrieved from a particular source organization. Once created, the trigger is used to create verbatim conversions and enhancement 1 conversions automatically. Creating additional enhancement conversions can be done with the parameter -e, e.g. ./convert-mydataset.sh -e 2.

Cheat Sheet

This section outlines the files that are used to produce a conversion.

--force

If you just don't care about saving time, and want "the latest conversion" to just regenerate, use the --force argument to the conversion trigger:

./convert-DATASET.sh --force

For example,

bash-3.2$ ./convert-cell-based-range.sh
--------------------------------------------------------------------------------
test.csv
E1 output automatic/test.csv.e1.ttl newer than enhancement parameters manual/test.csv.e1.params.ttl; skipping.
   convert.sh done
convert-aggregate.sh not publishing b/c $CSV2RDF4LOD_PUBLISH=false.
===========================================================================================

The above avoids redundant processing, which you can force by adding the --force argument:

bash-3.2$ ./convert-cell-based-range.sh --force
--------------------------------------------------------------------------------
test.csv
E1 output automatic/test.csv.e1.ttl is newer than enhancement parameters manual/test.csv.e1.params.ttl, but you --force'd so we'll do it anyway.
2 rows in manual/test.csv
convert-cell-based-range.sh overriding conversion:base_uri in parameters file with http://localhost
E1 CONVERSION
========== edu.rpi.tw.data.csv.CSVtoRDF version 2013-June-22 initiated:
fileName:                     manual/test.csv
...
Generated 48 triples in 0 min. ( 25945.9 triples/min )
...
INFO: writing metadata to separate file from data automatic/test.csv.e1.void.ttl
========== edu.rpi.tw.data.csv.CSVtoRDF complete. ==========
...

Rerunning the verbatim conversion

To save time, the conversion trigger avoids rerunning conversions that already exist.

So, if you already have the verbatim conversion at automatic/*.raw.ttl, remove those files and rerun the conversion trigger:

rm automatic/*.raw.ttl
./convert-DATASET.sh

Notes

  • The surrogate variable is an old name for base_uri. It was originally named surrogate because csv2rdf4lod is designed to aggregate data from third party source organizations and republish RDF representations to third party consumers. base_uri is the web host that csv2rdf4lod users will provide their data, which acts as a surrogate for the third party source organizations that have not adopted linked data publishing principles. The values of base_uri are drawn from the shell environment variable CSV2RDF4LOD_BASE_URI.
  • The environment variables subjectDiscriminator, commentCharacter, cellDelimiter``header``dataStart, repeatAboveIfEmptyCol, onlyIfCol, interpretAsNull, and dataEnd reflect a limited subset of the RDF-encoded enhancement parameters. THESE ARE ONLY FOR BOOTSTRAPPING. They are only used when manual/*.e1.params.ttl do not exist, at which point these variables are used to populate the RDF-encoded enhancement parameters. If you want to feed information about subject discriminators, comment characters, cell delimiters, etc., you can put them in the conversion trigger, but it is highly recommended to specify them within the manual/*.e1.params.ttl files directly.

See

See also

Clone this wiki locally