search | ||||
---|---|---|---|---|
|
The Extractor Transformer and Loader, or ETL, module for OrientDB provides support for moving data to and from OrientDB databases using ETL processes.
- Configuration: The ETL module uses a configuration file, written in JSON.
- Extractor Pulls data from the source database.
- Transformers Convert the data in the pipeline from its source format to one accessible to the target database.
- Loader loads the data into the target database.
The ETL module receives a backup file from another database, it then converts the fields into an accessible format and loads it into OrientDB.
EXTRACTOR => TRANSFORMERS[] => LOADER
For example, consider the process for a CSV file. Using the ETL module, OrientDB loads the file, applies whatever changes it needs, then stores the record as a document into the current OrientDB database.
+-----------+-----------------------+-----------+
| | PIPELINE |
+ EXTRACTOR +-----------------------+-----------+
| | TRANSFORMERS | LOADER |
+-----------+-----------------------+-----------+
| FILE ==> CSV->FIELD->MERGE ==> OrientDB |
+-----------+-----------------------+-----------+
You can modify this pipeline, allowing the transformation and loading phases to run in parallel by setting the configuration variable "parallel"
to true
.
{"parallel": true}
Beginning with version 2.0, OrientDB bundles the ETL module with the official release.
To use the ETL module, run the oetl.sh
script with the configuration file given as an argument.
$ $ORIENTDB_HOME/bin/oetl.sh config-dbpedia.json
When you run the ETL module, you can define its configuration variables by passing it a JSON file, which the ETL module resolves at run-time by passing them as it starts up.
You could also define the values for these variables through command-line options. For example, you could assign the database URL as ${databaseURL}
, then pass the relevant argument through the command-line:
$ $ORIENTDB_HOME/bin/oetl.sh config-dbpedia.json \
-databaseURL=plocal:/tmp/mydb
When the ETL module initializes, it pulls /tmp/mydb
from the command-line to define this variable in the configuration file.
Examples: