A simple wrapper around Apache Spark spark-submit command.
sparklight is a library for submitting Spark jobs either locally or to a cluster.
It is designed to provide easy access for setting up and submitting Spark jobs, removing the complexity of the command-line-interface.
- Apache Spark
- pyspark
Clone this repository then run the setup.sh.
git clone [email protected]:dsmiff/sparklight.git
./setup.sh
- examples/cars_submit.py: Submits a simple spark job to perform a groupBy on the cars.csv dataset
- Submit to cluster functionality
- HDFS interface
- DAG jobs