Skip to content

his project involves simulating a streaming data transmission, where we use Kafka and Python to implement a regression algorithm. For each record that arrives at the Kafka consumer, a prediction is made.

Notifications You must be signed in to change notification settings

lapiceroazul4/workshop03_kafka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What's this project about?

This workshop involves simulating a streaming data transmission, where we use Kafka and Python to implement a regression algorithm. For each record that arrives at the Kafka consumer, a prediction is made.

Folders' Structures

workshop03_kafka/
├── 0_src/                      # Source scripts
│   ├── __init__.py             # Initialization file for the module
│   ├── Database.py             # Database handling script
│   ├── df_test.py              # DataFrame testing script
│   ├── kafka_test.py           # Kafka testing script
│   ├── modelo_regresion.pkl    # Pickled regression model
│   └── transform.py            # Data transformation script
├── 1_edas/                     # Exploratory Data Analysis
│   └── eda.ipynb               # EDA Jupyter notebook
├── .gitignore                  # Ignored files for Git
├── docker-compose.yml          # Docker Compose configuration
├── main.py                     # Main execution script
├── README.md                   # Project description and guide
└── requirements.txt            # Project dependencies

Prerequisites

Before getting started with this project, make sure you have the following components installed or ready:

Environment Setup

Here are the steps to set up your development environment:

  1. create a virtual enviroment: Run the following command to create a virtual enviroment called venv:

    python -m venv venv
    
  2. activate your venv: Run the following commands to activate the enviroment:

    cd venv/bin
    source activate
    

in case you don't have the folder 'bin' go to 'Scripts' Folder

  1. Install Dependencies: Once you're in the venv run the following command to install the necessary dependencies:

    pip install -r requirements.txt
    
  2. Create pg_config: You need to create a json file called "pg_config" with the following information, make sure you replace the values with the correspondent information :

    {
     "user" : "myuser",
     "passwd" : "mypass",
     "server" : "XXX.XX.XX.XX",
     "database" : "demo_db"
    }  
    
  3. Running docker compose: Go to the project's folder and run:

  • docker-compose up
  • docker ps

Open a terminal and enter to the container with:

  • docker exec -it kafka-test bash

create a new topic

  • kafka-topics --bootstrap-server kafka-test:9092 --create --topic kafka_workshop
  1. Run main.py: At this point everything is ready and you can run:

    python main.py
    
    

Contact

If you have any questions or suggestions, feel free to contact me at [[email protected]].

About

his project involves simulating a streaming data transmission, where we use Kafka and Python to implement a regression algorithm. For each record that arrives at the Kafka consumer, a prediction is made.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published