spark-etl-framework/docs/submit-job.md at main · qwshen/spark-etl-framework · GitHub

Create a work directory and make it as the current directory, such as

mkdir ~/workspace
cd ~/workspace

Create the application configuration as this
Prepare test data
```
mkdir data/users
mkdir data/train
```
- Create a users.csv file in data/users with this
- Create a train.txt file in data/train with this
Create sql statement for transforming users & train
```
mkdir scripts
```
Create a transform-user-train.sql file as this
Create the pipeline as this
Compile the project & copy the jar file (spark-etl-framework-xxx.jar) to the current directory.

Submit the job

spark-submit --master local --deploy-mode client \
--name user-train --conf spark.executor.memory=8g --conf spark.driver.memory=4g \
--class com.qwshen.Launcher spark-etl-framework-xxx.jar \
--pipeline-def ./pipeline_fileRead-fileWrite.xml --application-conf ./application.conf \
--var application.process_date=20200921

Check & review the result in data/features