Skip to content

Streamlit web app to generate datasets with OpenAI's GPT models

License

Notifications You must be signed in to change notification settings

knjk04/dataset-generator

Repository files navigation

Dataset generator screenshot

Conventional commits badge

A Streamlit web app that generates datasets using GPT models.

Features:

  • Choose between GPT 3.5 Turbo and text-davinci-003
  • Export dataset to CSV

Note: the "text-davinci-002", "davinci" and "curie" models will not be supported as they don't perform as well for this use case

Running locally:

Prerequisites:

  1. In the root of the project, build the images: docker-compose build
  2. Run the services: docker-compose up
  3. Go to http://localhost:8501/ to access the frontend.

Configure development environment:

  1. Run pip install -r requirements-dev.txt
  2. Install pre-commit hook: pre-commit install
  3. (Optional) run hook: pre-commit run --all-files

The backend and frontend directories also contain requirements that need to be installed if running locally without Docker.

PyCharm: Mark the src directory as sources root: PyCharm sources root

To do this, go to Settings > Project > Project Structure. Then, click on the src folder. Finally, click on the blue Sources button.

Disclaimer

The quality of the datasets generated depend on the responses by OpenAI GPT models. Consequently, they may not be factually correct. Please corroborate any data generated with factual sources.