Skip to content

BigQuery Information Schema Looker Block (Marketplace Install version with Refinements)

License

Notifications You must be signed in to change notification settings

looker-open-source/bigquery_information_schema_block

 
 

Repository files navigation

BigQuery Information Schema Looker Block

This repository contains a Looker block for monitoring and optimizing Google BigQuery usage and performance, built off of the Information Schema tables provided natively within BigQuery.

Setup

Manifest

The manifest.lkml file contains a number of parameters to be configured. Review the inline comments in the manifest file for more details.

Required BQ Permissions

Generally, the required permissions (listed below) are available to the BigQuery Resource Admin and BigQuery Admin roles.

This block requires a Service Account with the following BigQuery permissions:

Highlighted Datasets / Explores

  • Jobs
    • The Jobs explore allows users to explore one of the four JOBS_BY_X tables
    • The table contains one row for each job, up to a retention limit of 180 days.
    • This is the richest dataset in this block, and can be used to answer questions including: job volume, slot usage, job timing and performance metrics, data volume, and many attributes that explain performance.
    • The explore also exposes nested fields, such as referenced tables, job stages, and job stage steps.
    • The specific table explored depends on the scope specified in your manifest.lkml file.
    • Depending on the scope, the SQL text of jobs may or may not be available. If it is, dimensions are also provided that extract Looker contextual information, such as history ID or user ID, from the SQL text
    • If you want to explore a JOBS_BY_X other than the one specified in your scope, two hidden explores exist that explicitly override the scope: jobs_in_project and jobs_in_organization
  • Jobs Timeline
    • The Jobs Timeline explore allows users to explore one of the four JOBS_TIMELINE_BY_X tables
    • The table contains one row for each second that each job was active.
    • Although this explore exposes fewer fields & nested detail than the jobs explore, it can be more convenient for aggregating certain metrics based on when queries were running, rather than based on when queries were created.
    • The specific table explored depends on the scope specified in your manifest.lkml file.
    • If you want to explore a JOBS_TIMELINE_BY_X other than the one specified in your scope, two hidden explores exist that explicitly override the scope: jobs_timeline_in_project and jobs_timeline_in_organization
  • Reservations data
  • Cross-table explore
    • Although it exposes fewer details than the dedicated Jobs explore, the "All" explore joins together multiple fact tables in a way that can facilitate certain analyses, such as measuring slots usage and slot capacity co-dimensioned at the project leve.l

Highlighted Dashboards

Pulse dashboard thumbnail preview
  • Pulse - The Pulse dashboard is a self-contained top-level dashboard that you can use to understand current consumption, performance and efficiency, with the use of Week-to-date and week-over-week metrics throughout.
  • Time Window Investigation - The Time Window Investigation dashboard focuses on a specific timerange, removing any comparisons to prior periods, and providing tiles that can help you pick out problematic patterns or outliers to drill into.
  • Job Lookup - Primarily intended to be accessed from the ellipsis menu from any Job ID as a "drill across", the Job Lookup dashboard is a deep-dive into a single job, letting you see both job metrics as well as step-by-step query plans, and information about referenced tables.

Concepts

Slots and Capacity

A key concept for understanding BigQuery usage is slots. BigQuery uses Slots (virtual CPUs) to execute queries in a heavily-distributed parallel architecture. Customers on the flat-rate pricing model explicitly choose how many slots to reserve, also known as Slot Commitments, which can be purchased at the Annual, Monthly, or Flex (60-second) level. Queries run within that capacity, and you pay for that capacity continuously every second it's deployed. For example, if you purchase 2,000 BigQuery slots, your queries in aggregate are limited to using 2,000 virtual CPUs at any given time. You will have this capacity until you delete it, and you will pay for 2,000 slots until you delete them.

Bytes Shuffled to Disk

The percentage of data written to shuffle and spilled to disk can be a good indicator of queries that are overwhelming slot resources and could be further optimized, for example, queries with heavy data skews. Use the Job Lookup Dashboard to drill into individual queries that are spilling heavy volumes of data to disk, viewing their individual stages and identifying opportunities for optimization.

% of Cached Queries

A higher percentage of Cached queries will result in lower on-demand costs and lower resource utilization. In addition to reducing costs, queries that use cached results are significantly faster because BigQuery does not need to compute the result set.

BigQuery Information Schema Data Structure

More information on the Information Schema can be found in Google Cloud documentation.

About

BigQuery Information Schema Looker Block (Marketplace Install version with Refinements)

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • LookML 100.0%