rt-tweet-classification

Simple pipeline for real-time text classification of tweets streamed from Twitter API.

There is a message producer that reads tweets, given one or more words to track, then streams them over Kafka so a consumer can proceed with the text classification. Using Kafka here makes it very easy to plug in additional steps in this pipeline or even serve different pipelines and stuff.

Prerequisites

Before try it out, you got to provide a few Twitter API keys which you can find a place in .env.sample. And as you can see there, I suppose you would like to use virtualenv too. Sorry my presuntion though.

So our prerequisites here are:

Python (https://www.python.org/)
Virtualenv (https://virtualenv.pypa.io/)
Apache Kafka (https://kafka.apache.org/)
Twitter API (https://developer.twitter.com/)

As far as Kafka goes, if you have Docker installed, no worries, I got your back, a.k.a. docker-compose.yml.

When you are good to go with python and virtualenv installed and those Twitter keys on hand:

Open a terminal, git clone this repo wherever you like, and cd into it
Rename .env.sample to just .env
Add those Twitter keys on .env
Set KAFKA_BROKER on .env as you like or leave it as is
Run virtualenv .venv to create a new virtual environment
Run source .venv to load this new virtual environment
Run pip install -r requirements.txt to install all dependencies

Done.

Try it out

First thing to do is to train the model. This is a one time kind of thing. So open a terminal and fire:

$ source .env
$ python trainer.py

Now, if you don't already have a Kafka up and running, you can use the provided docker-compose.yml:

$ export DOCKERHOST=`docker-machine ip`
$ docker-compose up -d

Once it is ready, you can start the consumer in one terminal:

$ source .env
$ python consumer.py

And finally, start the producer in another one:

$ source .env
$ python producer.py "Java" "PHP" "JavaScript"

Now you can stalk them all... ho ho ho

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

rt-tweet-classification

Prerequisites

Try it out

Files

README.md

Latest commit

History

README.md

File metadata and controls

rt-tweet-classification

Prerequisites

Try it out