Skip to content

Python script demonstrating spark streaming and Kafka implementation using an e-commerce website like product recommendation engine based on item-based collaborative filtering. 🐍. πŸ’₯

License

Notifications You must be signed in to change notification settings

vaibhavmagon/Kafka-Spark-Product-Recommendation

Repository files navigation

Kafka & Spark Streaming Product Recommendation Engine

Tweet

This code is to demonstrate spark streaming and kafka implementation using a real life e-commerce website product recommendation example. For any (as in data dir) specific item_id we get best recommended items.

Disclaimer: This is not the best tutorial for learning and implementing Recommendation engine. Though I have used item-based collaborative filtering built using python and no machine learning libraries for this tutorial.

Data Files

  • Create a /data folder with below two files.
  • item-data.csv - Item Id and Item Name
19444 | Radhe gold aata 25 kg
  • user-item.csv - User Id & Item Id mapped to timestamp for matrix for collaborative filtering
49721 | 19853 | 2020-09-18T20:02:20+05:30

Dependencies

  • Zookeeper
  • Kafka
  • Spark: 2.4
  • Spark Stream Package: Necessary package it is part of the code.
  • Python: 3.7 (for Spark 2.4 version)

How to run?

All these commands (except for Registry of topics) go in seperate console windows on the terminal.

  • Zookeeper: zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
    (Zoopkeeper is the broker that helps coordinate the show)

  • Kafka Server: kafka-server-start /usr/local/etc/kafka/server.properties
    (Kafka server syncs data flow between Producer and Consumers)

  • Register 1st Topic: kafka-topics --create --zookeeper localhost:2181 --topic idpushtopic --partitions 1 --replication-factor 1
    (idpushtopic is the topic to get an Id of product for recommendations. To be run once only.)

  • Register 2nd Topic: kafka-topics --create --zookeeper localhost:2181 --topic prodRecommSend --partitions 1 --replication-factor 1
    (prodRecommSend is the topic to which the stream pushes the recommendations. To be run once only.)

  • Producer: kafka-console-producer --broker-list localhost:9092 --topic idpushtopic
    (Producer to be opened in one window that ultimately pushes the product id to the spark stream)

  • Spark Streaming: spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.4.7.jar kafka-product-recom.py localhost:2181 idpushtopic
    (Stream helps fetch the Id, run the product recommendation and then acts as producer and pushes to another topic)

  • Consumer: kafka-console-consumer --bootstrap-server localhost:9092 --topic prodRecommSend --from-beginning
    (Consumer to be opened in one window that ultimately fetches the product list from the spark stream)

How to test?

  • Start Producer in one window and push an item id: 19444 (to test)

Screenshot-2020-09-19-at-2-25-03-AM

  • You'd see things happening on Spark streaming window.

Screenshot-2020-09-19-at-2-23-38-AM

  • Simultaneously you'd get the data in the consumer window

Screenshot-2020-09-19-at-2-21-15-AM

Maintainer

  • Vaibhav Magon

About

Python script demonstrating spark streaming and Kafka implementation using an e-commerce website like product recommendation engine based on item-based collaborative filtering. 🐍. πŸ’₯

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages