Skip to content

nigiva/epita-spark-project

Repository files navigation

SPARK Project - Restore peace to Peaceland

First Part : Theory and architecture of the solution

In this part, we answer a set of questions and we ultimately propose an architecture for the project. 👉 Subject - 1st part 👉 Report - 1st part 👉 Slides - 1st defense

Second part : The POC

In this part, we propose an implementation of the architecture adapted to the project. 👉 Subject - 2nd part 👉 Slides - 2nd defense

Collaborators of the project

Name Email Github account
Erwan Goudard [email protected] Grouane
Adrien Merat [email protected] Timelessprod
Corentin Duchêne [email protected] Nigiva
Henri Jamet [email protected] hjamet

How to launch the project

In case you get a an error saying Connection refused or if you are using WSL, please restart your ssh service : sudo service ssh restart. If you get a warning while executing the consumer or the producer, just press y and enter.

  1. Install ZooKeeper and Kafka
  2. Inatall Hadoop, HDFS and Spark
  3. Install Pyenv and Poetry

⚠️ Be sure to add binaries and shell scripts of the previous packages so they are accessible from anywhere on your computer

  1. Open a terminal and launch the ZooKeeper server : zookeeper-server-start.sh config/zookeeper.properties (let it run)

  2. In another terminal, create the Kafka topic : kafka-topics.sh --create --topic "drone-report" --bootstrap-server localhost:9092 and then launch the Kafka server : kafka-server-start.sh config/server.properties (let it run)

  3. In another terminal, start dfs and yarn services : start-dfs.sh && start-yarn.sh

  4. In another terminal, launch the website for monitoring :

    • cd website/
    • poetry shell (we use Python 3.9.7 via Pyenv)
    • flask run
  5. In another terminal, launch the streaming consumer : cd consumer && sbt run (let it run)

  6. In another terminal, launch the streaming producer (chich will fake data send by drones by reading a json). Depending on the scenario you want to execute, execute one of the command below :

    • For happy citizens, launch : cd producer && sbt "run ../json/s2.json" (let it run)
    • For coleric citizens, launch : cd producer && sbt "run ../json/s2.json" (let it run)
    • For anxious citizens, launch : cd producer && sbt "run ../json/s3.json" (let it run)
  7. To check that data is correctly written on HDFS, you can launch hdfs dfs -ls /drone-reports. It should print the files containing data.

To stop the application, simply kill all processus running in the opened terminal with Ctrl+C and run stop-dfs.sh && stop-yarn.sh