Skip to content

Latest commit

 

History

History
36 lines (32 loc) · 1.19 KB

submit-job.md

File metadata and controls

36 lines (32 loc) · 1.19 KB
  • Create a work directory and make it as the current directory, such as
mkdir ~/workspace
cd ~/workspace
  • Create the application configuration as this

  • Prepare test data

    mkdir data/users
    mkdir data/train
    • Create a users.csv file in data/users with this
    • Create a train.txt file in data/train with this
  • Create sql statement for transforming users & train

    mkdir scripts

    Create a transform-user-train.sql file as this

  • Create the pipeline as this

  • Compile the project & copy the jar file (spark-etl-framework-xxx.jar) to the current directory.

  • Submit the job

    spark-submit --master local --deploy-mode client \
    --name user-train --conf spark.executor.memory=8g --conf spark.driver.memory=4g \
    --class com.qwshen.Launcher spark-etl-framework-xxx.jar \
    --pipeline-def ./pipeline_fileRead-fileWrite.xml --application-conf ./application.conf \
    --var application.process_date=20200921
  • Check & review the result in data/features