mlinspect-SQL

This is an SQL extension to the mlinspect framework to transpile Python library functions to SQL for execution within a database system.

Run mlinspect locally

Prerequisite: Python 3.8

Clone this repository
Set up the environment

cd mlinspect
python -m venv venv
source venv/bin/activate
If you want to use the visualisation functions we provide, install graphviz which can not be installed via pip

Linux: apt-get install graphviz
MAC OS: brew install graphviz
Install pip dependencies

pip install -e .[dev]
To ensure everything works, you can run the tests (without graphviz, the visualisation test will fail)

python setup.py test

How to use the SQL backend

We prepared two examples, the first is to demonstrate execution of machine learning pipelines only, the second demonstrate a full end-to-end machine learning pipeline that compares the performance of different backends.

In order to run the latter one, you need a PostgreSQL database system running (at port 5432) in the background with an user luca with password password that is allowed to copy from CSV files and has access to the respective database.

create user luca;
alter role luca with password 'password';
grant pg_read_server_files to luca;
create database healthcare_benchmark;
grant all privileges on database healthcare_benchmark to luca;

To also run the benchmarks in Umbra, you need an Umbra server running at port 5433.

For more information on the functions supported w.r.t execution outsourced to DBMS, please see here.

How to use mlinspect

mlinspect makes it easy to analyze your pipeline and automatically check for common issues.

from mlinspect import PipelineInspector
from mlinspect.inspections import MaterializeFirstOutputRows
from mlinspect.checks import NoBiasIntroducedFor

IPYNB_PATH = ...

inspector_result = PipelineInspector\
        .on_pipeline_from_ipynb_file(IPYNB_PATH)\
        .add_required_inspection(MaterializeFirstOutputRows(5))\
        .add_check(NoBiasIntroducedFor(['race']))\
        .execute()

extracted_dag = inspector_result.dag
dag_node_to_inspection_results = inspector_result.dag_node_to_inspection_results
check_to_check_results = inspector_result.check_to_check_results

With execution outsourced to a Database Management System (DBMS):

from mlinspect.to_sql.dbms_connectors.postgresql_connector import PostgresqlConnector
from mlinspect import PipelineInspector
from mlinspect.inspections import MaterializeFirstOutputRows
from mlinspect.checks import NoBiasIntroducedFor

dbms_connector = PostgresqlConnector(...)

IPYNB_PATH = ...

inspector_result = PipelineInspector\
        .on_pipeline_from_ipynb_file(IPYNB_PATH)\
        .add_required_inspection(MaterializeFirstOutputRows(5))\
        .add_check(NoBiasIntroducedFor(['race']))\
        .execute_in_sql(dbms_connector=dbms_connector, mode="VIEW", materialize=True)

extracted_dag = inspector_result.dag
dag_node_to_inspection_results = inspector_result.dag_node_to_inspection_results
check_to_check_results = inspector_result.check_to_check_results

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github		.github
demo		demo
example_pipelines		example_pipelines
example_to_sql		example_to_sql
experiments		experiments
mlinspect		mlinspect
requirements		requirements
test		test
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlinspect-SQL

Run mlinspect locally

How to use the SQL backend

How to use mlinspect

About

Releases

Packages

Contributors 4

Languages

License

tum-db/mlinspect4sql

Folders and files

Latest commit

History

Repository files navigation

mlinspect-SQL

Run mlinspect locally

How to use the SQL backend

How to use mlinspect

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages