Skip to content

Latest commit

 

History

History
60 lines (46 loc) · 2.49 KB

README.md

File metadata and controls

60 lines (46 loc) · 2.49 KB

kafka-spark-scala

A Scala source-to-image application skeleton for using Apache Spark and Kafka on OpenShift.

This application will simply read messages from a Kafka topic, and the write those messages back out to a second topic. It will achieve this using Spark's streaming utilities for Kafka.

Prerequisites

  • OpenShift - this application is designed for use on OpenShift, you can find great documentation and starter guides on their website.

  • Apache Kafka - because this application requires a Kafka broker to read from and write to, you will need to a broker deployed and a source of information. The Strimzi project provides some great documentation and manifests for running Kafka on OpenShift.

Helpful tools

To help accelerate work with Kafka, here are a few applications to help:

  • Emitter - this is a skeleton to publish text information on a Kafka topic.

  • Listener - this is a skeleton to log all messages from a Kafka topic.

Quickstart

As this project utilizes Spark, it will be easiest to consume on OpenShift by using the RADanalytics tooling. The source-to-image nature of this application will require that a Spark cluster is available. The shortest path to making that connection is to use the automatically spawned Spark clusters that are created by the Oshinko project source-to-image utilities. Please see that documentation for more information about this process.

  1. see the radanalytics.io Get Started page for instructions on installing that tooling

  2. launch the skeleton with the following command:

    oc new-app --template oshinko-scala-spark-build-dc \
        -p APPLICATION_NAME=skeleton \
        -p GIT_URI=https://gitlab.com/bones-brigade/kafka-spark-scala \
        -p APP_MAIN_CLASS=org.bonesbrigade.skeletons.kafkasparkopenshift.Main \
        -p SPARK_OPTIONS='--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 --conf spark.jars.ivy=/tmp/.ivy2' \
        -e KAFKA_BROKERS=kafka:9092 \
        -e KAFKA_IN_TOPIC=topic1 \
        -e KAFKA_OUT_TOPIC=topic2

In this example, our application will subscribe to messages on the Kafka topic topic1, and it will publish messages on the topic topic2 using the broker at kafka:9092.