Skip to content

Latest commit

 

History

History
138 lines (104 loc) · 6.35 KB

README.md

File metadata and controls

138 lines (104 loc) · 6.35 KB

Liquibase-Databricks Connector

Summary

This is the Liquibase Extension for Managing Delta Tables on DatabricksSQL.

Base/Contributed and Foundational Change types should be supported at this stage. Change types such as procedures, triggers, sequences, indexes are not supported. Databricks specific change types that are added are listed below along with their completion status. Databricks tables creates with liquibase are automatically created with the Delta configs / versions that are required for all passing change types including: 'delta.feature.allowColumnDefaults' = 'supported', 'delta.columnMapping.mode' = 'name'

NOTE! ONLY TABLES CREATED WITH UNITY CATALOG ARE SUPPORTED FOR MOST ADVANCED OPERATIONS

This extension utilizes Unity Catalog System tables for many advanced operations such as snapshotting, identifying various constraints (PK/FK/NOT NULL, etc.) If hive_metastore is used, this is not tested and may not provide all the below functionality.

Harness Status:

  1. Add unit tests with liquibase test harness - Cody Davis - DONE
  2. Pass Foundational Test Harness - Cody Davis - DONE 4/1/2023
  3. Pass Contributed Test Harness - Cody Davis - DONE 9/15/2023
  4. Pass Advanced Test Harness - Cody Davis - DONE 9/28/2023

Currently Supported Change Types:

Contributed / Base

  1. createTable/dropTable
  2. addColumn/dropColumn
  3. addPrimaryKey/dropPrimaryKey
  4. addForeignKey/dropForeignKey
  5. addNotNullConstraint/dropNotNullConstraint
  6. createTable/createTableDataTypeText/createTableTimestamp/dropTable
  7. createView/dropView
  8. dropAllForeignKeyConstraints
  9. createView/dropView
  10. setTableRemarks - supported but not returned in snapshot as JDBC Driver not populating it
  11. setColumnRemarks
  12. setViewRemarks (set in TBLPROPERTIES ('comment' = ''))
  13. executeCommand
  14. mergeColumns
  15. modifySql
  16. renameColumn
  17. renameView
  18. sql
  19. sqlFile
  20. Change Data Test: apply delete
  21. Change Data Test: apply insert
  22. Change Data Test: apply loadData
  23. Change Data Test: apply loadDataUpdate
  24. Add/Drop Check Constraints - supported but not returned in snapshot

Advanced

  1. addColumn snapshot
  2. addPrimaryKey snapshot
  3. addForeignKey snapshot
  4. schemaAndCatalogSnapshot snapshot
  5. createTable snapshot
  6. createView snapshot
  7. generateChangelog -
  8. addUniqueConstraint - not supported
  9. createIndex - Not Supported, use changeClusterColumns change type for datbricks to Map to CLUSTER BY ALTER TABLE statements for Delta Tables
  10. alterTableProperties
  11. alterCluster

Databricks Specific:

  1. OPTIMIZE - optimizeTable - optimize with zorderCols options - SUPPORTED in Contributed Harness
  2. CLUSTER BY (DDL) - createClusteredTable - createTable with clusterColumns as additional option for liquid - SUPPORTED in Contributed Harness
  3. ANALYZE TABLE - analyzeTable - change type with compute stats column options - SUPPORTED in Contributed Harness
  4. VACUUM - vacuumTable - change type with retentionHours parameter (default is 168) - SUPPORTED in Contributed Harness
  5. ALTER CLUSTER KEY - alterCluster - change type that will be used until index change types are mapped with CLUSTER BY columns for snapshot purposes

Remaining Required Change Types to Finish in Base/Contributed

  1. (nice to have, not required) createFunction/dropFunction - in Liquibase Pro, should work in Databricks, but change type not accessible from Liquibase Core
  2. (nice to have, not required) addCheckConstraint/dropCheckConstraint - in Liquibase Pro, should work in Databricks, but change type not accessible from Liquibase Core
  3. addDefaultValue (of various types). Databricks/Delta tables support this, but does not get populated by databricks in the JDBC Driver (COLUMN_DEF property always None even with default) The remaining other change types are not relevant to Databricks and have been marked with INVALID TEST

General TO DO:

  1. Add support for Snapshotting complex types like STRUCT/MAP
  2. Add support for snapshotting IDENTITY KEYs
  3. Add TIMESTAMP_NTZ Data Type

Aspirational Roadmap - Databricks Specific Additional Change Types to Add:

  1. COPY INTO
  2. MERGE
  3. RESTORE VERSION AS OF
  4. ANALYZE TABLE - Code Complete - Adding Tests - Cody Davis
  5. CLONE
  6. BLOOM FILTERS - Maybe do not support, CLUSTER BY should be the primary indexing mechanism long term
  7. OPTIMIZE / ZORDER - Code Complete - Adding Tests - Cody Davis
  8. VACUUM - Code Complete - Adding Tests - Cody Davis
  9. SYNC IDENTITY
  10. VOLUMES
  11. GRANT / REVOKE statements

How to use the Liquibase-Databricks Extension

Steps:

  1. Download and install liquibase from here

  2. Download the Databricks Driver from here. Then put this driver jar under the liquibase/lib directory.

  3. Build this project or retrieve the jar from the latest release. Then put this extension jar under the liquibase/lib directory.

  4. IMPORTANT: If using Linux/MaxOS - run the following command in your terminal before continuing (you can add this to the bash/zsh profile): export JAVA_OPTS=--add-opens=java.base/java.nio=ALL-UNNAMED

  5. Edit the connection parameters to your Databricks catlaog/database under the liquibase.properties file. The format will look like this:

url: jdbc:databricks://<workspace_url>:443/default;transportMode=http;ssl=1;httpPath=<http_path>;AuthMech=3;ConnCatalog=<catalog>;ConnSchema=<database>; 
username: token
password: <dbx_token>

Where the following parameters are:

  • workspace_url: The url of the host name you are connecting to
  • dbx_token: The token of your user or application service principal authorized for running any needed Liquibase operations.
  • http_path: This is the http_path of a Databricks SQL Warehouse or a Databricks Cluster (DBR). Either cluster type can be used. Best Results are with Serverless SQL Warehouses.
  • catalog: The catalog name you want to connect to (default is main).
  • database: The database / schema name you want to connect to.
    1. Add changes and run your change logs like so:
       liquibase --changeLogFile=changelog.sql update