Avro Output Plugin

The Avro Output Plugin for Pentaho Data Integration allows you to output Avro files using Kettle. Avro files are commonly used in Hadoop allowing for schema evolution and truly separating the write schema from the read schema.

A big thank you to my employer Inquidia Consulting for allowing me to open source this plugin.

System Requirements

-Pentaho Data Integration 6.0 or above (Plugin Version 2.2.0 and above) -Pentaho Data Integration 5.x or above (Plugin Version 2.1.x and below)

Installation

Using Pentaho Marketplace

In the Pentaho Marketplace find the Avro Output plugin and click Install
Restart Spoon

Manual Install

Place the AvroOutputPlugin folder in the ${DI_HOME}/plugins/steps directory
Restart Spoon

Usage

Schema Requirements

Arrays are not supported by this step. It is currently not possible to output an Avro array using the Avro Output Plugin. All other Avro types are supported including complex records.

File Tab

Output Type
- Binary file - Outputs to an Avro file using binary encoding
- Binary message - Outputs the data as an Avro binary encoded message on the stream
- JSON message - Outputs the data as an Avro JSON encoded message on the stream
Filename - The name of the file to output
Output Field - The field on the stream to output the encoded message
Automatically create schema? - Should the step automatically create the schema for the output records?
Write schema to file? - Should the step persist the automatically created schema to a file?
Avro namespace - The namespace for the automatically created schema.
Avro record name - The record name for the automatically created schema.
Avro documentation - The documentation for the automatically created schema.
Schema filename - The name of the Avro schema file to use when writing.
Create parent folder? - Create the parent folder if it does not exist.
Include stepnr in filename? - Should the step number be included in the filename? Used for starting multiple copies of the step.
Include partition nr in filename? - Used for partitioned transformations.
Include date in filname? - Include the current date in the filename in yyyyMMdd format.
Include time in filename? - Include the current time in the filename in HHmmss format.
Specify date format? - Specify your own format for including the date time in the filename.
Date time format - The date time format to use.

Fields Tab

Name - The name of the field on the stream
Avro Path - The dot delimited path to where the field will be stored in the Avro file. (If this is empty the stream name will be used. If the schema file exists and is valid, the drop down will automatically populate with the fields from the schema.)
Avro Type - The type used to store the field in Avro. Since Avro supports unions of multiple types you must select a type. (If the schema file exists and is valid the drop down will automatically limit to types that are available for the Avro Path selected.)
Nullable? - Should the field be nullable in the Avro schema. Only used if "automatically create avro schema" is checked.
Get Fields button - Gets the list of input fields, and tries to map them to an Avro field by an exact name match.
Update Types button - Based on the Avro Path for the field, will make a best guess effort for the Avro Type that should be used.

Building from Source

The Avro Output Plugin is built using Ant and requires both Ivy and Maven to also be installed.

Edit the build.properties file and set the Pentaho version you wish to build the plugin for.
Run "ant clean-all resolve resolve-default dist" to build the plugin.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
build-res		build-res
package-res		package-res
src		src
.classpath		.classpath
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
assembly.xml		assembly.xml
avro-1.6.2.jar		avro-1.6.2.jar
avro.png		avro.png
build.properties		build.properties
build.xml		build.xml
ivy.xml		ivy.xml
ivysettings.xml		ivysettings.xml
package-ivy.xml		package-ivy.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Avro Output Plugin

System Requirements

Installation

Usage

Building from Source

About

Releases 10

Packages

Contributors 3

Languages

License

cdeptula/AvroOutputPlugin

Folders and files

Latest commit

History

Repository files navigation

Avro Output Plugin

System Requirements

Installation

Usage

Building from Source

About

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 3

Languages

Packages