Azure Purview - Containerized app for declaring custom sensitivity labels to assets as glossary terms

A containerized Python flask app that exposes an API for interacting with Azure Purview to implement Business Logic, using:

Overview

Currently, the three top-level functionalities implemented:

Create a list of glossary terms to track Custom/organization specific Sensitivity Labels (using a minified JSON)

💡 Today, Purview only offers Automatic Labelling via Microsoft 365 Sensitivity Labels - this method using Glossary Terms offers a workaround for organizations not leveraging M365 labels, and allows us to programmatically query Purview's REST API to interrogate the asset for declared state (as declared by Data Teams).
Create an entire asset chain for an Azure SQL Database, and apply glossary terms to serve as Custom Data Classifications (using a minified JSON)

💡 The core value add here is that the Asset Columns will have the declared state available at time of provisioning, which allows us to monitor for classification drift using the methods demonstrated here. This is not possible without having the Asset present with the Custom labels within Purview before the first scan runs, i.e. without this capability, we are not able to track the initial state.
Trigger Scan to establish end-to-end asset relationships and have Purview apply Classifications

Pre-reqs

Azure SQL DB Data Source has been registered with Purview (one-time activity)
A Scan has been created on the Data Source, but not run (one-time activity):

Note that this could have been done using an API call as well if required.

We start with no Assets in this particular demo, but other assets can exist (assuming no conflict):
We start with no Glossary Terms in this particular demo, but other Terms can exist (assuming no conflict):

Run container on Docker Desktop

Clone this repo - then to run the container locally on Docker Desktop, run:

# Build container from Dockerfile
docker build -t purview-asset-ingestor .

# Start container by injecting environment variables
docker run `
  -e "PURVIEW_NAME=<your--purview--account>" `
  -e "AZURE_CLIENT_ID=<your--client--id>" `
  -e "AZURE_CLIENT_SECRET=<your--client--secret>" `
  -e "AZURE_TENANT_ID=<your--azure--tenant--id>" `
  -p 5000:5000 `
  --rm -it purview-asset-ingestor

And the container can be called via Postman at http://127.0.0.1:5000 as a GET request:

Run container on Kubernetes

Use the deployment.yaml file to create a Kubernetes deployment:

# Create namespace, deployment and external service
kubectl create namespace purview
kubectl apply -f "secret-sample.yaml"
kubectl apply -f "deployment.yaml"
kubectl expose deployment purview-asset-ingestor --type=LoadBalancer --name=purview-asset-ingestor-service -n purview

# Tail logs
kubectl logs purview-asset-ingestor-6c7d49b4bf-x4mrl -n purview --follow

Demonstration

Step 1: Create a list of glossary terms to track Custom/organization specific Classification Labels(using a minified JSON)

The following minified JSON payload represents our Organization's Custom Classification Labels:

[
  {
    "longDescription": "Passwords, access code, security questions or similar.",
    "name": "Contoso_IC_Restricted"
  },
  {
    "longDescription": "Sensitive Personal Info, Material Business Information",
    "name": "Contoso_IC_Sensitive"
  },
  {
    "longDescription": "Financial, Personal, Business, Product, Project or Proprietary Information.",
    "name": "Contoso_IC_Confidential"
  },
  {
    "longDescription": "Internal phone directory, employeed IDs, HR Policies, Client info not combined with PII",
    "name": "Contoso_IC_Internal"
  },
  {
    "longDescription": "Published public information that can be found on the internet.",
    "name": "Contoso_IC_Public"
  }
]

We perform a POST request to http://127.0.0.1:5000/api/glossary/terms using Postman with the above JSON in the Body:

And we see the Glossary Terms get created within Purview:

Step 2: Create an entire asset chain for an Azure SQL Database, and apply glossary terms to serve as Custom Data Classifications (using a minified JSON)

The following minified JSON payload represents Azure SQL Database we are looking to onboard - containing the Application Specific Data Schema and declared classifications:

{
  "serverName": "aemigration",
  "collectionId": "aia-purview-new",
  "databaseName": "contosoHR_AE",
  "schemaName": "dbo",
  "table": {
    "name": "Employees",
    "columns": [
      {
        "name": "Salary",
        "data_type": "varbinary",
        "classification": "Contoso_IC_Confidential"
      },
      {
        "name": "EmployeeID",
        "data_type": "int",
        "classification": "Contoso_IC_Internal"
      },
      {
        "name": "LastName",
        "data_type": "nvarchar",
        "classification": "Contoso_IC_Confidential"
      },
      {
        "name": "FirstName",
        "data_type": "nvarchar",
        "classification": "Contoso_IC_Confidential"
      },
      {
        "name": "SSN",
        "data_type": "varbinary",
        "classification": "Contoso_IC_Sensitive"
      }
    ]
  }
}

We perform a POST request to http://127.0.0.1:5000/api/assets using Postman with the above JSON in the Body:

And we see the Assets get created within Purview (including the Columns and classifications):

Step 3: Trigger Scan to establish end-to-end asset relationships and have Purview apply Classifications

The following JSON payload asks Purview to run a scan against the Data Source we already established in pre-reqs:

{
  "dataSourceName" : "contosoHR",
  "scanName" : "Scan-AE"
}

We perform a POST request to http://127.0.0.1:5000/api/scan using Postman with the above JSON in the Body:

And we see the scan begins on the asset:

Step 4: Observe Assets with Custom Sensitivity labels (i.e. glossary terms) applied per column

Once the Scan is Completed:

We see the Assets have the Glossary Terms applied on search facet, and the Term layer:

And the Asset is labelled at the column level:

As desired.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
asset-payload		asset-payload
images		images
.flaskenv		.flaskenv
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
app_service.py		app_service.py
deployment.yaml		deployment.yaml
requirements.txt		requirements.txt
secret-sample.yaml		secret-sample.yaml

mdrakiburrahman/purview-asset-ingestor

Folders and files

Latest commit

History

Repository files navigation

Azure Purview - Containerized app for declaring custom sensitivity labels to assets as glossary terms

Table of Contents

Overview

Pre-reqs

Run container on Docker Desktop

Run container on Kubernetes

Demonstration

Step 1: Create a list of glossary terms to track Custom/organization specific Classification Labels(using a minified JSON)

Step 2: Create an entire asset chain for an Azure SQL Database, and apply glossary terms to serve as Custom Data Classifications (using a minified JSON)

Step 3: Trigger Scan to establish end-to-end asset relationships and have Purview apply Classifications

Step 4: Observe Assets with Custom Sensitivity labels (i.e. glossary terms) applied per column

Additional Resources

About

Topics

Resources

Stars

Watchers

Forks

Languages