Dejavu

Dejavu is a stack meant for PyFlink quickstarts. It includes Docker images to start a local development, together with automation scripts and some examples.

Project Structure

docker: Docker image for PyFlink and some related scripts for building and pushing
.devcontainer: Configuration files for Dev Container for VS Code
app: PyFlink examples with some sample data
VERSION file: This file will be read throughout the automation scripts to determine which version of PyFlink we want to use (at the point of writing, we're using the latest 1.13.2)
docker-compose.yml: Docker-compose file to bring up the development stack
Makefile: Created out of the intention to run local tests. It doesn't play any role for now, except that you can run make install to create a virtual env so your IDE doesn't complain about missing packages 😛

Installation

Before starting working on this project, the following pre-requisites are needed:

Docker
Docker Compose
(Optional) Make via sudo apt-get install make - This is only meant for running tests and not needed at the moment.

NOTE: The development stack is tested with Ubuntu 20.4 and MacOS X but since it is only dependent on Docker, it should work fine on Windows.

Development

To start local development stack:

Build the docker images with ./docker/pyflink/build.sh and ./docker/data_generator/build.sh
Bring up the containers with docker-compose up --remove-orphans -d
(Optional) If you are using VS Code, the stack is also shipped with a devcontainer.json file so that you can attach VS Code to the jobmanager container with the mounted workspace to use it as a full-featured environment. Read More

There are a few examples provided.

Example bank_stream_table_api.py. To run the example:

docker-compose exec jobmanager ./bin/flink run -py /opt/app/bank_stream_table_api.py

Check the result in the output path and it should look something like below:

$ docker-compose exec taskmanager find /tmp/output
/tmp/output
/tmp/output/2021-08-31--04
/tmp/output/2021-08-31--04/.part-0-0.inprogress.47949645-ba93-40c7-a18e-d88ba1845cf2

$ docker-compose exec taskmanager cat /tmp/output/2021-08-31--04/.part-0-0.inprogress.47949645-ba93-40c7-a18e-d88ba1845cf2
+I[2017-12-20 00:00:00.0, 2018-01-19 00:00:00.0, 2018-01-18 23:59:59.999, 409000611074, 1.3515737E7]
+I[2018-01-19 00:00:00.0, 2018-02-18 00:00:00.0, 2018-02-17 23:59:59.999, 409000611074, 1.2706551E7]
+I[2018-02-18 00:00:00.0, 2018-03-20 00:00:00.0, 2018-03-19 23:59:59.999, 409000611074, 1.5782355E7]

Example bank_stream_sql.py does the same thing as bank_stream_table_api.py but with SQL using Table API's execute_sql. To run the example:

docker-compose exec jobmanager ./bin/flink run -py /opt/app/bank_stream_sql.py

In this example, since we are using .print(), the output should look something like the below.

$ docker-compose exec jobmanager ./bin/flink run -py /opt/app/bank_stream_sql.py
Job has been submitted with JobID 3cdb1ad83553e88d788666f4f0dce83c
+----+-------------------------+-------------------------+--------------------------------+--------------------------------+
| op |            window_start |              window_end |                     account_no |                          total |
+----+-------------------------+-------------------------+--------------------------------+--------------------------------+
| +I | 2017-12-20 00:00:00.000 | 2018-01-19 00:00:00.000 |                   409000611074 |                    1.3515737E7 |
| +I | 2018-01-19 00:00:00.000 | 2018-02-18 00:00:00.000 |                   409000611074 |                    1.2706551E7 |
| +I | 2018-02-18 00:00:00.000 | 2018-03-20 00:00:00.000 |                   409000611074 |                    1.5782355E7 |
| +I | 2018-03-20 00:00:00.000 | 2018-04-19 00:00:00.000 |                   409000611074 |                    1.3791593E7 |
+----+-------------------------+-------------------------+--------------------------------+--------------------------------+
18 rows in set

To perform the same operation as the 2 examples above with SQL, we can bash into jobmanager with
```
docker exec -it jobmanager /bin/bash
```
and then spin up the SQL client that is shipped with Flink.
```
./bin/sql-client.sh
```
Refer to bank_stream_queries.sql for the queries to run. The result should look like the below.
Example kafka_transaction_stream.py generates some dummy data, streams it into a Kafka topic and then uses Flink to read it as source, perform some operations and stream it back to another Kakfa topic as sink.
```
$ docker-compose exec jobmanager ./bin/flink run -py /opt/app/kafka_transaction_stream.py
Job has been submitted with JobID a86fc9ca3025bf196829e561cbe1665d
```
As Flink indicates that job has been submitted, run the below to check the result:
```
docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic temp_result
```

Test

Test suites for this stack have not yet been developed (a PR is more than welcome 😄). The plan was to have PyFlink installed in a Python virtual env to run the tests locally on host. A Makefile has been scaffolded for this purpose.

Deployment

There isn't deployment suite for this stack. Refer to Ideas & Future Development

Ideas & Future Development

Deployment: Something like Apache Ansible would be great to have and is easy to set up. The expected result would be an automated deployment suite to deploy the stack to a cloud provider or a remote host.
More Examples - Some ideas are:
- Create Kafka container to quick start some interactions between Flink and Kafka
- Demo functionalities of Python UDF and more examples with DataStream API
- Perhaps extending the stack to a full-suite solution with databases and dashboards (with PostgreSQL, Apache Superset, etc.). This would couple nicely with the idea of deployment automation

Have fun! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dejavu

Project Structure

Installation

Development

Test

Deployment

Ideas & Future Development

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dejavu

Project Structure

Installation

Development

Test

Deployment

Ideas & Future Development