Skip to content

Commit

Permalink
Merge pull request #31 from microsoft/py-packaging
Browse files Browse the repository at this point in the history
Py packaging
  • Loading branch information
Chenglong-MS authored Oct 11, 2024
2 parents 6b8ba2c + e623830 commit 6af6886
Show file tree
Hide file tree
Showing 45 changed files with 1,456 additions and 578 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
// "forwardPorts": [],

// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "python3 -m venv /workspaces/data-formulator/venv && . /workspaces/data-formulator/venv/bin/activate && pip install -r /workspaces/data-formulator/requirements.txt --verbose && yarn install && yarn build"
"postCreateCommand": "python3 -m venv /workspaces/data-formulator/venv && . /workspaces/data-formulator/venv/bin/activate && pip install https://github.com/user-attachments/files/17319752/data_formulator-0.1.0.tar.gz --verbose && data_formulator"

// Configure tool-specific properties.
// "customizations": {},
Expand Down
34 changes: 0 additions & 34 deletions .github/workflows/build.yml

This file was deleted.

62 changes: 62 additions & 0 deletions .github/workflows/python-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: build

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
steps:
- uses: actions/checkout@v4
- name: Set Node.js 20
uses: actions/setup-node@v4
with:
node-version: 20
cache: 'yarn'
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install node dependencies
run: yarn install
- name: Install python dependencies
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
python -m pip install build
- name: Build frontend
run: yarn build
- name: Build python artifact
run: python -m build
- name: Archive production artifacts
uses: actions/upload-artifact@v4
with:
name: release-dist
path: dist

pypi-publish:
runs-on: ubuntu-latest
needs:
- build
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags') # only publish when push with tag
environment:
name: pypi
url: https://pypi.org/p/data-formulator
permissions:
id-token: write
steps:
- name: Retrieve release distributions
uses: actions/download-artifact@v4
with:
name: release-dist
path: dist/
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@


*openai-keys.env
**/*.ipynb_checkpoints/

.DS_Store
Expand Down
3 changes: 1 addition & 2 deletions CODESPACES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,11 @@ You will need a GitHub account and to be logged in to use Codespaces.
### Step 2: Run the app
The codespace is a VSCode development environment in the cloud. Once the Codespace is created, start Data Formuator with the following steps:

* Press **F5** to run. Or if you prefer, click the **Run and Debug** tab on the left, and the **Start Debugging** button.
* A toast about port forwarding will appear, click the **Open in Browser** button.
* You will see the Data Formulator app!

<kbd>
<img width="528" alt="image" src="https://github.com/user-attachments/assets/e62bebda-8daf-4587-94d4-fede48de382b">
<img width="528" alt="image" src="https://github.com/user-attachments/assets/cb9e2123-4a42-4926-8b59-5bafb9be25fa">
</kbd>


Expand Down
54 changes: 42 additions & 12 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ How to set up your local machine.
## Backend (Python)

- **Create a Virtual Environment**
```bash
python -m venv venv
.\venv\Scripts\activate
```
```bash
python -m venv venv
.\venv\Scripts\activate
```

- **Install Dependencies**
```bash
pip install -r requirements.txt
```
```bash
pip install -r requirements.txt
```

- **Run**
- **Windows**
Expand All @@ -33,9 +33,10 @@ pip install -r requirements.txt
## Frontend (TypeScript)

- **Install NPM packages**
```bash
yarn
```

```bash
yarn
```

- **Development mode**

Expand All @@ -46,14 +47,43 @@ yarn
Open [http://localhost:3000](http://localhost:3000) to view it in the browser.
The page will reload if you make edits. You will also see any lint errors in the console.

- **Build for Production**
## Build for Production

- **Build the frontend and then the backend**

Compile the TypeScript files and bundle the project:
```bash
yarn build
```
This builds the app for production to the `dist` folder.
This builds the app for production to the `py-src/data_formulator/dist` folder.

Then, build python package:

```bash
pip install build
python -m build
```
This will create a python wheel in the `dist/` folder. The name would be `data_formulator-<version>-py3-none-any.whl`

- **Test the artifact**

You can then install the build result wheel (testing in a virtual environment is recommended):
```bash
# replace <version> with the actual build version.
pip install dist/data_formulator-<version>-py3-none-any.whl
```

Once installed, you can run Data Formulator with:
```bash
data_formulator
```
or
```bash
python -m data_formulator
```

Open [http://localhost:5000](http://localhost:5000) to view it in the browser.


## Usage
See the [Usage section on the README.md page](README.md#usage).
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
include py-src/data_formulator/dist/*
include py-src/data_formulator/dist/assets/*
70 changes: 48 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,75 +6,101 @@

[![arxiv](https://img.shields.io/badge/Paper-arXiv:2408.16119-b31b1b.svg)](https://arxiv.org/abs/2408.16119)&ensp;
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)&ensp;
[![YouTube](https://img.shields.io/badge/YouTube-white?logo=youtube&logoColor=%23FF0000)](https://youtu.be/3ndlwt0Wi3c)&ensp;
[![build](https://github.com/microsoft/data-formulator/actions/workflows/python-build.yml/badge.svg)](https://github.com/microsoft/data-formulator/actions/workflows/python-build.yml)

</div>

Transform data and create rich visualizations iteratively with AI 🪄. Try Data Formulator now in GitHub Codespaces!

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)


<kbd>
<a target="_blank" rel="noopener noreferrer" href="https://codespaces.new/microsoft/data-formulator?quickstart=1" title="open Data Formulator in GitHub Codespaces"><img src="public/data-formulator-screenshot.png"></a>
</kbd>


## News 🔥🔥🔥

- [10-11-2024] Data Formulator python package released!
- You can now install Data Formulator using Python and run it locally, easily. [[check it out]](#get-started).
- Our Codespace configuration is also updated for fast start up ⚡️. [[try it now!]](https://codespaces.new/microsoft/data-formulator?quickstart=1)
- New exprimental feature: load an image or a messy text, and ask AI parsing and cleaning it for you(!). [[demo]](https://github.com/microsoft/data-formulator/pull/31#issuecomment-2403652717)

- [10-01-2024] Initial release of Data Formulator, check out our [[blog]](https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/) and [[video]](https://youtu.be/3ndlwt0Wi3c)!



## Overview

**Data Formulator** is an application from Microsoft Research that uses large language models to transform data, expediting the practice of data visualization.

To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need proficiency in data transformation and visualization tools, and they also spend effort managing the iteration history. This can be challenging!
Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs* for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.

Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines user interface interactions (UI) with natural language (NL) inputs. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.
## Get Started

Check out these cool Data Formulator features that can help you create impressive visualizations!
* Using the **blended UI and NL inputs** to describe the chart.
* Utilizing **data threads** to navigate the history and reuse previous results to create new ones instead of starting from scratch every time.
Play with Data Formulator with one of the following options:

## Get Started
- **Option 1: Install via Python PIP**

Use Python PIP for an easy setup experience, running locally (recommend: install it in a virtual environment).

```bash
# install data_formulator
pip install data_formulator

Choose one of the following options to set up Data Formulator:
# start data_formulator
data_formulator

# alternatively, you can run data formualtor with this command
python -m data_formulator
```

- **Option 1: Codespaces**
Data Formulator will be automatically opened in the browser at [http://localhost:5000](http://localhost:5000).

- **Option 2: Codespaces (5 minutes)**

Use Codespaces for an easy setup experience, as everything is preconfigured to get you up and running quickly. For more details, see [CODESPACES.md](CODESPACES.md).
You can also run Data Formualtor in codespace, we have everything pre-configured. For more details, see [CODESPACES.md](CODESPACES.md).

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)

- **Option 2: Local Installation**
- **Option 3: Working in the developer mode**

Opt for a local installation if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).
You can build Data Formulator locally if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).


## Using Data Formulator

Once you’ve completed the setup using either option, follow these steps to start using Data Formulator:

### The basics of data visualization
* Provide OpenAI keys and select a model (GPT-4o suggested) and choose a dataset
* Choose a visualization type
* Drag and drop data fields to the encoding shelf to create visualization

* Provide OpenAI keys and select a model (GPT-4o suggested) and choose a dataset.
* Choose a chart type, and then drag-and-drop data fields to chart properties (x, y, color, ...) to specify visual encodings.

https://github.com/user-attachments/assets/0fbea012-1d2d-46c3-a923-b1fc5eb5e5b8


### Create visualization beyond the initial dataset (powered by 🤖)
* Add new field names in the encoding shelf, describe the chart intent
* Click the **Formulate** button
* Inspect the code behind the concept
* Follow up the chart to create new ones
* You can type names of **fields that do not exist in current data** in the encoding shelf:
- this tells Data Formulator that you want to create visualizions that require computation or transformation from existing data,
- you can optionally provide a natural language prompt to explain your intent to clarify your intent (not necessary when field names are self-explanatory).
* Click the **Formulate** button.
- Data Formulator will transform data and instantiate the visualization based on the encoding and prompt.
* Inspect the data, chart and code.
* To create a new chart based on existing ones, follow up in natural language:
- provide a follow up prompt (e.g., *``show only top 5!''*),
- you may also update visual encodings for the new chart.

https://github.com/user-attachments/assets/160c69d2-f42d-435c-9ff3-b1229b5bddba

https://github.com/user-attachments/assets/c93b3e84-8ca8-49ae-80ea-f91ceef34acb

Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel.

## Developers
## Developers' Guide

Follow the [developers' instructions](DEVELOPMENT.md) to build your new data analysis tools on top of Data Formulator.


## Research Papers
* [Data Formulator 2: Iteratively Creating Rich Visualizations with AI](https://arxiv.org/abs/2408.16119)

Expand Down
2 changes: 1 addition & 1 deletion local_server.bat
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
:: Licensed under the MIT License.

@echo off
set FLASK_APP=app.py
set FLASK_APP=py-src/data_formulator/app.py
set FLASK_RUN_PORT=5000
set FLASK_RUN_HOST=0.0.0.0
flask run
2 changes: 1 addition & 1 deletion local_server.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

env FLASK_APP=app.py FLASK_RUN_PORT=5000 FLASK_RUN_HOST=0.0.0.0 flask run
env FLASK_APP=py-src/data_formulator/app.py FLASK_RUN_PORT=5000 FLASK_RUN_HOST=0.0.0.0 flask run
3 changes: 3 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"ag-grid-enterprise": "^32.0.2",
"ag-grid-react": "^32.0.2",
"d3": "^7.3.0",
"dompurify": "^3.1.7",
"localforage": "^1.10.0",
"lodash": "^4.17.21",
"markdown-to-jsx": "^7.1.8",
Expand All @@ -24,6 +25,7 @@
"react": "^18.2.0",
"react-animate-height": "^3.0.4",
"react-animate-on-change": "^2.2.0",
"react-diff-viewer": "^3.1.1",
"react-dnd": "^16.0.1",
"react-dnd-html5-backend": "^16.0.1",
"react-dom": "^18.2.0",
Expand All @@ -37,6 +39,7 @@
"redux": "^4.2.0",
"redux-persist": "^6.0.0",
"typescript": "^4.9.5",
"validator": "^13.12.0",
"vega": "^5.23.0",
"vega-embed": "^6.21.0",
"vega-lite": "^5.5.0",
Expand Down
5 changes: 5 additions & 0 deletions py-src/data_formulator/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .app import run_app

__all__ = [
"run_app",
]
4 changes: 4 additions & 0 deletions py-src/data_formulator/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .app import run_app

if __name__ == "__main__":
run_app()
Loading

0 comments on commit 6af6886

Please sign in to comment.