Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data access work #7

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@

Common script utilities which can help automate common tasks within a BigQuery data warehouse. Scripts are broken out into directories by language (e.g. bash, python, sql, etc.).

Nothing here yet, check back soon...
| Script | Description |
|--------|-------------|
| [Data Access Sample View](data_access) | Create Data Access audit logs, Stackdriver Sink, and a BigQuery view on the logs |

74 changes: 74 additions & 0 deletions scripts/data_access/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@

# Sample Data Access Audit View

## Overview

Google Cloud offers data access logs across many [services](https://cloud.google.com/logging/docs/audit/services).
This sample will walk through [enabling data access logs](https://cloud.google.com/logging/docs/audit/configure-data-access)
in a project, creating a Stackdriver sink, and creating a BigQuery View providing analytical information.

## Audit View

The View analyses READ, WRITE, and ADMIN operations across BigQuery datasets and GCS buckets. The logs
contains object and table-level information, but this is dropped.

The view is stored in ```data_access.audit_summary```.

There is one data access log entry for every service executing. There is also an associated
list in protopayload_auditlog.authorizationInfo that contains the list of permissions
granted (or denied) as part of the service execution.

These permissions are listed for [GCS permissions](https://cloud.google.com/storage/docs/access-control/iam-permissions) and
[BigQuery permissions](https://cloud.google.com/bigquery/docs/access-control#bq-permissions).

The columns of the View are the following:

| Column | Description |
| ------ | ----------- |
| hour | Top-of-the-hour the access occurred |
| service | Service (storage or bigquery) |
| actor | Service account or end-user |
| op | Operation (READ, WRITE, or ADMIN) |
| granted | Whether access was permitted |
| entity | Project_ID.GCS_Bucket or Prroject_ID.BigQuery_Dataset |

## Instructions

Capture the PROJECT_ID of your default project.

PROJECT_ID=$(gcloud config get-value core/project)

Enable data access audit logs.

POLICY_FILE=/tmp/policy_file_${PROJECT_ID}.$$

# Get existing project policy
gcloud projects get-iam-policy ${PROJECT_ID} --format=json > ${POLICY_FILE}

# Merge new_audit_policy.json into the policy
cat ${POLICY_FILE} | \
jq --slurpfile audit data_access_policy.json '.auditConfigs=$audit' \
> ${POLICY_FILE}.new

# Apply the new policy to the project
gcloud projects set-iam-policy ${PROJECT_ID} ${POLICY_FILE}.new
if [ $? -ne 0 ]; then
echo Failed applying policy
fi

rm $POLICY_FILE

Create your data_access dataset.

bq mk data_access

Create a data access audit sink. Be sure to grant BigQuery Data Editor role to the appropriate service account.

gcloud logging sinks create compute_activity \
bigquery.googleapis.com/projects/${PROJECT_ID}/datasets/data_access \
--log-filter="logName=\"projects/${PROJECT_ID}/logs/cloudaudit.googleapis.com%2Fdata_access\""

Create your data_access.audit_summary VIEW.

sed -e "s/\${PROJECT_ID}/${PROJECT_ID}/g" ./create_data_access_view.sql | bq query

132 changes: 132 additions & 0 deletions scripts/data_access/create_data_access_view.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
#standardSQL

--
-- This is setup for storage.googleapis.com and bigquery.googleapis.com
--
-- This is for examining access to bigquery datasets and buckets.
--
-- Table and object-level is available in the logs (and more!), but
-- this creates confusion for a dashboard.
--
CREATE OR REPLACE VIEW data_access.audit_summary AS
WITH
-- Pull out Data Access logs
DataAccess AS (
SELECT
-- Hour truncated
TIMESTAMP_TRUNC(d.timestamp, HOUR) AS hour,
-- Project ID that the access method was called on
d.resource.labels.project_id,
-- Actor
d.protopayload_auditlog.authenticationInfo.principalEmail AS actor,
-- Permission used to access data
SPLIT(i.permission,'.')[SAFE_OFFSET(0)] AS service,
-- Permission used
i.permission AS action,
-- Whether granted or denied
IFNULL(i.granted, FALSE) AS granted,
-- Parts of the resource accessed
SPLIT(i.resource, '/') AS parts
FROM
`${PROJECT_ID}.data_access.cloudaudit_googleapis_com_data_access_*` d
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add AS before alias.

CROSS JOIN d.protopayload_auditlog.authorizationInfo i
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align CROSS JOIN with FROM.

WHERE
i.resource IS NOT NULL AND
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move AND to the next line.

WHERE
  filter 1
  AND filter 2

d.protopayload_auditlog.serviceName IN ('storage.googleapis.com',
'bigquery.googleapis.com')
)
SELECT
hour,
service,
actor,
-- Translate the action into an operation (READ/WRITE/ADMIN)
CASE
WHEN service = 'storage' THEN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think by checking the action value, you can determine the action type? So, maybe simplify the code a little with one level of CASE statement?

Same for service='bigquery' code below.

CASE
-- See granular permissions here: https://cloud.google.com/storage/docs/access-control/iam-permissions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please intend CASE statements. Use 2 spaces for all indentation through out the script.

WHEN action IN ('storage.objects.create',
'storage.objects.delete') THEN
'WRITE'
WHEN action IN ('storage.objects.get') THEN
'READ'
WHEN action IN ('storage.objects.getIamPolicy',
'storage.objects.list',
'storage.objects.setIamPolicy',
'storage.objects.update',
'storage.buckets.create',
'storage.buckets.delete',
'storage.buckets.get',
'storage.buckets.getIamPolicy',
'storage.buckets.list',
'storage.buckets.setIamPolicy',
'storage.buckets.update') THEN
'ADMIN'
ELSE
CONCAT('Unknown storage:', action)
END
-- See granular permissions here: https://cloud.google.com/bigquery/docs/access-control#bq-permissions
WHEN service = 'bigquery' THEN
CASE
WHEN action IN ('bigquery.tables.delete',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indent WHEN statements.

'bigquery.datasets.delete',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align values with previous value. Same for the following WHEN statements.

'bigquery.jobs.update',
'bigquery.routines.delete',
'bigquery.tables.updateData') THEN
'WRITE'
WHEN action IN ('bigquery.tables.getData',
'bigquery.tables.export',
'bigquery.readsessions.create',
'bigquery.connections.use') THEN
'READ'
WHEN action IN ('bigquery.jobs.create',
'bigquery.jobs.listAll',
'bigquery.jobs.list',
'bigquery.jobs.get',
'bigquery.datasets.create',
'bigquery.datasets.get',
'bigquery.datasets.update',
'bigquery.tables.create',
'bigquery.tables.list',
'bigquery.tables.get',
'bigquery.tables.update',
'bigquery.routines.create',
'bigquery.routines.list',
'bigquery.routines.get',
'bigquery.routines.update',
'bigquery.transfers.get',
'bigquery.transfers.update',
'bigquery.savedqueries.create',
'bigquery.savedqueries.get',
'bigquery.savedqueries.list',
'bigquery.savedqueries.update',
'bigquery.savedqueries.delete',
'bigquery.connections.create',
'bigquery.connections.get',
'bigquery.connections.list',
'bigquery.connections.update',
'bigquery.connections.delete') THEN
'ADMIN'
ELSE
CONCAT('Unknown bigquery:', action)
END
ELSE
CONCAT('Unknown service:', service)
END AS op,
granted,
-- Project is of the resource or, if not there,
-- then for the method accessing it (eg for buckets)
CASE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a ELSE to set a default value here? Otherwise, it will create NULLs.

-- BigQuery project.dataset
WHEN service = 'bigquery' THEN
CONCAT(parts[SAFE_OFFSET(1)], '.', parts[SAFE_OFFSET(3)])
-- GCS project.bucket
WHEN service = 'storage' THEN
CONCAT(project_id, '.', parts[SAFE_OFFSET(3)])
END AS entity
FROM
DataAccess
WHERE
-- Limit to BigQuery dataset / GCS bucket operations
ARRAY_LENGTH(parts) >= 4
GROUP BY
1,2,3,4,5,6;
8 changes: 8 additions & 0 deletions scripts/data_access/data_access_policy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a description, what this file is for?

"service": "allServices",
"auditLogConfigs": [
{ "logType": "ADMIN_READ" },
{ "logType": "DATA_READ" },
{ "logType": "DATA_WRITE" }
]
}