Skip to content

AHaryanto/azure-automl-mlops

Repository files navigation

Introduction

data science lifecycle

Azure Machine Learning's automated ML capability helps you discover high-performing models without you reimplementing every possible approach. Combined with Azure Machine Learning pipelines, you can create deployable workflows that can quickly discover the algorithm that works best for your data. This project will show you how to efficiently join a data preparation step to an automated ML step. Automated ML can quickly discover the algorithm that works best for your data, while putting you on the road to MLOps and model lifecycle operationalization with pipelines.

Machine Learning Pipelines

  • Training pipeline:

    training pipeline diagram

  • Scoring pipeline:

    scoring pipeline diagram

Prerequisites

Getting Started

  1. Setup a local conda environment.

    ! Note: replace myenv with the environment name.

    • Create a new environment from an env.yaml file:

      > conda env create --name myenv --file env.yaml

      OR:

    • Update an existing environment:

      > conda env update --name myenv --file env.yaml

  2. Configure your workspace settings in config.json.

    {
    "subscription_id": "my_subscription_id",
    "resource_group": "my_resource_group",
    "workspace_name": "my_workspace_name",
    "tenant_id": "my_tenant_id",
    "compute_name": "my_compute_cluster",
    "aks_cluster_name": "my_inference_cluster",
    "aks_endpoint_name": "my_endpoint",
    "location": "westus2",
    ...
    }
    
  3. Register initial training dataset in transform.py.

    training_tabular_dataset = Dataset.Tabular.register_pandas_dataframe(
            dataframe=train_df,
            target=workspaceblobstore,
            name=args.dataset_name,
            show_progress=True)
    
  4. Run your automated ML training pipeline.

    > cd project_directory
    > python training_main.py
    
  5. Run your scoring pipeline.

    > cd project_directory
    > python scoring_main.py
    

Reports

  1. Install Power BI Desktop.

    To download Power BI Desktop, go to the Power BI Desktop download page and select Download Free. Or for download options, select See download or language options.

    Note: Power BI requires that you use a work or school email address. You can't sign up or purchase using email addresses provided by consumer email services or telecommunication providers. This includes outlook.com, hotmail.com, gmail.com, and others. If you don't have a work or school account, learn about alternate ways to sign up.

  2. Open model_explanation_report.pbit and connect to data.

  3. Summarize the effects of all the features.

    To get an overview of which features are most important for a model we can plot the SHAP values of every feature for every sample. The plot below sorts features by the sum of SHAP value magnitudes over all samples, and uses SHAP values to show the distribution of the impacts each feature has on the model output. The color represents the SHAP value (red negative, blue positive).

    shap

For detailed information on SHAP analysis, see Advanced Model Explanation.

Deploy to Azure

You can deploy a model as a real-time web service to several kinds of compute target, including local compute, an Azure Machine Learning compute instance, an Azure Container Instance (ACI), an Azure Kubernetes Service (AKS) cluster, an Azure Function, or an Internet of Things (IoT) module. Azure Machine Learning uses containers as a deployment mechanism, packaging the model and the code to use it as an image that can be deployed to a container in your chosen compute target. In this project, learn how to deploy a model to an AKS cluster.

  1. Create a new AKS cluster and deploy a registered model to the AKS cluster in deploy.py.

    aks_target = ComputeTarget.create(
        workspace=f.ws,
        name=aks_cluster_name,
        provisioning_configuration=provisioning_config)
    
    aks_service = Model.deploy(
        workspace=f.ws,
        name=real_time_endpoint_name,
        models=[model],
        inference_config=inference_config,
        deployment_config=aks_config,
        deployment_target=aks_target,
        overwrite=True)
    
  2. Call your remote webservice and consume a real-time inferencing service in consume.py.

    y_pred = aks_service.run(input_data=data_json)
    

Promote to Production

  1. Sign up for Azure DevOps. Azure DevOps gives you an integrated set of services and tools to manage your software projects, from planning and development through testing and deployment. Azure DevOps is free for open-source projects and small projects with up to five users. For larger teams, purchase a plan based on the number of users.

    CI/CD

  2. Configure your production workspace settings in prod_pipelines.yaml.

    variables:
      service_connection: "my_prod_service_connection"
      subscription_id: 'my_prod_subscription_id'
      resource_group: "my_prod_resource_group"
      workspace_name: 'my_prod_workspace_name'
    ...
    
  3. Create an Azure Pipeline from prod_pipelines.yaml.

  4. Run the pipeline.

    azure pipeline run

Getting Help

This project is under active development by Alvin Haryanto.

Power BI report was developed by William Harding.

If you have questions, comments, or just want to have a good old-fashioned chat about MLOps with Azure Machine Learning, please reach out to me at [email protected] or linkedin.com/in/alvinharyanto.

About

Get started with Automated Machine Learning (AutoML) and Machine Learning Operations (MLOps) in Azure Machine Learning

Topics

Resources

License

Stars

Watchers

Forks

Languages