Spark Streaming for World Domination (and other projects)

PyGotham 2017

Ask not what you can do for real time data streams but what they can do for you. This talk will give an overview of Apache Spark and pyspark (Spark’s Python API), with an emphasis on Spark’s Streaming API.

Setup

This demo uses Jupyter docker-stacks (https://github.com/jupyter/docker-stacks).

// clone this repo
git clone [email protected]:wsuen/PyGotham_Spark_Streaming_demo.git
cd PyGotham_Spark_Streaming_demo

// build Docker image
docker build -t <img_name> .

// launch container
docker run --name <container_name> -p 4040:4040 -p 8888:8888 <img_name>

// explore!

Running notebooks

Sign up for developer credentials for Twitter's Streaming API on apps.twitter.com. Store your credentials in config file (bin/config.example). You can also edit TweetRead.py to filter by keywords of your choice, locations of tweets, and other parameters.

// with container running, start streaming data
docker exec <container_id> python3 bin/TweetRead.py

This starts the Twitte firehose, and sends the messages themselves to port 5555. At this point, you're reading to start building a streaming app using the included notebook.

Exploration

Spark UI: localhost:4040 by default

Jupyter notebook server: localhost:8888 by default

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bin		bin
notebooks		notebooks
Dockerfile		Dockerfile
README.md		README.md
SparkStreaming_PyGotham2017.pdf		SparkStreaming_PyGotham2017.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Streaming for World Domination (and other projects)

Setup

Running notebooks

Exploration

About

Releases

Packages

Languages

wsuen/PyGotham_Spark_Streaming_demo

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming for World Domination (and other projects)

Setup

Running notebooks

Exploration

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages