Releases · delta-io/delta

20 Dec 03:28

v3.3.0rc2

44c6538

Delta Lake 3.3.0 (RC2) Pre-release

Pre-release

We are excited to announce the release of Delta Lake 3.3.0 RC2! Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.

Highlights

Delta Spark

Support for Identity Column to assign unique values for each record inserted into a table.
Support VACUUM LITE to deliver faster VACUUM for periodically run VACUUM commands.
Support for Row Tracking Backfill to alter an existing table to enable Row Tracking. Row Tracking allows engines such as Spark to track row-level lineage in Delta Lake tables.
Support for enhanced table state validation with version checksums and improved Snapshot initialization performance based on this checksum.

Delta UniForm

Support for enabling UniForm Iceberg on existing tables without rewriting the data files using ALTER TABLE.

Delta Kernel

Support for reading Delta tables that have Type Widening enabled.

More detailed release notes on these exciting features as well as the other changes included in this release coming soon!

Artifacts

We have published the artifacts to a staging repository.

Delta Spark artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
Delta UniForm artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13
Delta Kernel artifacts: delta-kernel-api, delta-kernel-defaults

How to use the Delta Spark Release Candidate

Download Spark 3.5 from https://spark.apache.org/downloads.html.

For this release candidate, we have published the artifacts to a staging repository. Here’s how you can use them:

Spark Submit

Add --repositories https://oss.sonatype.org/content/repositories/iodelta-1181 to the command line arguments.
Example:

spark-submit --packages io.delta:delta-spark_2.12:3.3.0 --repositories https://oss.sonatype.org/content/repositories/iodelta-1181 examples/examples.py

Currently Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.3.0 by just providing the --packages io.delta:delta-spark_2.12:3.3.0 argument.

Spark Shell

bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1181 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Spark SQL

bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1181 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Maven

<repositories>
  <repository>
    <id>staging-repo</id>
    <url>https://oss.sonatype.org/content/repositories/iodelta-1181</url>
  </repository>
</repositories>
<dependency>
  <groupId>io.delta</groupId>
  <artifactId>delta-spark_2.12</artifactId>
  <version>3.3.0</version>
</dependency>

SBT Project

libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1181

(PySpark) Delta-Spark

Download two artifacts from pre-release: https://github.com/delta-io/delta/releases/tag/v3.3.0rc2
Artifacts to download are:
- delta-spark-3.3.0.tar.gz
- delta_spark-3.3.0-py3-none-any.whl
Keep them in one directory. Let’s call that ~/Downloads
pip install ~/Downloads/delta_spark-3.3.0-py3-none-any.whl
pip show delta-spark should show output similar to the below

Name: delta-spark
Version: 3.3.0
Summary: Python APIs for using Delta Lake with Apache Spark
Home-page: https://github.com/delta-io/delta/
Author: The Delta Lake Project Authors
Author-email: [email protected]
License: Apache-2.0
Location: /Users/<user.name>/opt/anaconda3/envs/delta-release-3.3/lib/python3.8/site-packages
Requires: importlib-metadata, pyspark
Required-by:

Assets 4

10 Dec 19:40

allisonport-db

v3.3.0rc1

899f918

Delta Lake 3.3.0 (RC1) Pre-release

Pre-release

We are excited to announce the release of Delta Lake 3.3.0 RC1! Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.

Highlights

Delta Spark

Support for Identity Column to assign unique values for each record inserted into a table.
Support VACUUM LITE to deliver faster VACUUM for periodically run VACUUM commands.
Support for Row Tracking Backfill to alter an existing table to enable Row Tracking. Row Tracking allows engines such as Spark to track row-level lineage in Delta Lake tables.
Support for enhanced table state validation with version checksums and improved Snapshot initialization performance based on this checksum.

Delta UniForm

Support for enabling UniForm Iceberg on existing tables without rewriting the data files using ALTER TABLE.

Delta Kernel

Support for reading Delta tables that have Type Widening enabled.

More detailed release notes on these exciting features as well as the other changes included in this release coming soon!

Artifacts

We have published the artifacts to a staging repository.

Delta Spark artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
Delta UniForm artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13
Delta Kernel artifacts: delta-kernel-api, delta-kernel-defaults

How to use the Delta Spark Release Candidate

Download Spark 3.5 from https://spark.apache.org/downloads.html.

For this release candidate, we have published the artifacts to a staging repository. Here’s how you can use them:

Spark Submit

Add --repositories https://oss.sonatype.org/content/repositories/iodelta-1179 to the command line arguments.
Example:

spark-submit --packages io.delta:delta-spark_2.12:3.3.0 --repositories https://oss.sonatype.org/content/repositories/iodelta-1179 examples/examples.py

Spark Shell

bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1179 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Spark SQL

bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1179 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Maven

<repositories>
  <repository>
    <id>staging-repo</id>
    <url>https://oss.sonatype.org/content/repositories/iodelta-1179</url>
  </repository>
</repositories>
<dependency>
  <groupId>io.delta</groupId>
  <artifactId>delta-spark_2.12</artifactId>
  <version>3.3.0</version>
</dependency>

SBT Project

libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1179

(PySpark) Delta-Spark

Download two artifacts from pre-release: https://github.com/delta-io/delta/releases/tag/v3.3.0rc1
Artifacts to download are:
- delta-spark-3.3.0.tar.gz
- delta_spark-3.3.0-py3-none-any.whl
Keep them in one directory. Let’s call that ~/Downloads
pip install ~/Downloads/delta_spark-3.3.0-py3-none-any.whl
pip show delta-spark should show output similar to the below

Name: delta-spark
Version: 3.3.0
Summary: Python APIs for using Delta Lake with Apache Spark
Home-page: https://github.com/delta-io/delta/
Author: The Delta Lake Project Authors
Author-email: [email protected]
License: Apache-2.0
Location: /Users/allison.portis/opt/anaconda3/envs/delta-release-3.3/lib/python3.8/site-packages
Requires: importlib-metadata, pyspark
Required-by:

Assets 4

26 Sep 21:00

vkorukanti

v3.2.1

c8697c5

Delta Lake 3.2.1 Latest

Latest

We are excited to announce the release of Delta Lake 3.2.1! This release contains important bug fixes to 3.2.0 and it is recommended that users upgrade to 3.2.1.

Details by each component.

Delta Spark

Delta Spark 3.2.1 is built on Apache Spark™ 3.5.3. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Documentation: https://docs.delta.io/3.2.1/index.html
API documentation: https://docs.delta.io/3.2.1/delta-apidoc.html#delta-spark
Artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Support for Apache Spark™ 3.5.3.
Fix MERGE operation not being recorded in QueryExecutionListener when submitted through Scala/Python API.
Support RESTORE on a Delta table with clustering enabled
Fix replacing the clustered table with non-clustered table.
Fix an issue when running clustering on table with single column selected as clustering columns.

Delta Universal Format (UniForm)

Documentation: https://docs.delta.io/3.2.1/delta-uniform.html
Artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13

The key changes of this release are:

Added the support to enable Uniform Iceberg on existing Delta tables by ALTER table instead of REORG, which rewrites data files.
Fixed a bug that Uniform iceberg conversion transaction should not convert commit with only AddFiles without data change

Delta Sharing Spark

Documentation: https://docs.delta.io/3.2.1/delta-sharing.html
Artifacts: delta-sharing-spark_2.12, delta-sharing-spark_2.13

The key changes of this release are:

Upgrade delta-sharing-client to version 1.1.1 which removes the pre-signed URL address from the error message on access errors.
Fix an issue with DeltaSharingLogFileStatus

Delta Kernel

API documentation: https://docs.delta.io/3.2.1/delta-kernel.html
Artifacts: delta-kernel-api, delta-kernel-defaults

The key changes of this release are:

Fix comparison issues with string values having characters with surrogate pairs. This fixes a corner case with wrong results when comparing characters (e.g. emojis) that have surrogate pairs in UTF-16 representation.
Fix ClassNotFoundException issue when loading LogStores in Kernel default Engine module. This issue happens in some environments where the thread local class loader is not set.
Fix error when querying tables with spaces in the path name. Now you can query tables with paths having any valid path characters.
Fix an issue with writing decimal as binary when writing decimals with certain scale and precision when writing them to the Parquet file.
Throw proper exception when unsupported VOID data type is encountered in Delta tables when reading.
Handle long type values in field metadata of columns in schema. Earlier Kernel was throwing a parsing exception, now Kernel handles long types.
Fix an issue where Kernel retries multiple times when _last_checkpoint file is not found. Now Kernel tries just once when file not found exception is thrown.
Support reading Parquet files with legacy map type physical formats. Earlier Kernel used to throw errors, now Kernel can read data from file containing legacy map physical formats.
Support reading Parquet files with legacy 3-level repeated type physical formats.
Write timestamp data to Parquet file as INT64 physical format instead of INT96 physical format. INT96 is a legacy physical format that is deprecated.

For more information, refer to:

User guide on step-by-step process of using Kernel in a standalone Java program or in a distributed processing connector.
Slides explaining the rationale behind Kernel and the API design.
Example Java programs that illustrate how to read Delta tables using the Kernel APIs.
Table and default Engine API Java documentation

Delta Standalone (deprecated in favor of Delta Kernel)

API documentation: https://docs.delta.io/3.2.1/delta-standalone.html
Artifacts:delta-standalone_2.12, delta-standalone_2.13

This release does not update Standalone. Standalone is being deprecated in favor of Delta Kernel, which supports advanced features in Delta tables.

Delta Storage

Artifacts: delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Fix an issue with VACUUM when using the S3DynamoDBLogStore where the LogStore made unnecessary listFrom calls to DynamoDB, causing a ProvisionedThroughputExceededException

Credits

Abhishek Radhakrishnan, Allison Portis, Charlene Lyu, Fred Storage Liu, Jiaheng Tang, Johan Lasperas, Lin Zhou, Marko Ilić, Scott Sandre, Tathagata Das, Tom van Bussel, Venki Korukanti, Wenchen Fan, Zihao Xu

Assets 10

24 Sep 17:06

vkorukanti

v3.2.1rc3

c8697c5

Delta Lake 3.2.1 RC3 Pre-release

Pre-release

We are excited to announce the release of Delta Lake 3.2.1 RC3! This release contains important bug fixes to 3.2.1 and it is recommended that users update to 3.2.1. Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.

Details by each component.

Delta Spark

Delta Spark 3.2.1 is built on Apache Spark™ 3.5.3. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Documentation: https://docs.delta.io/3.2.1/index.html
API documentation: https://docs.delta.io/3.2.1/delta-apidoc.html#delta-spark
RC3 artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Support for Apache Spark™ 3.5.3.
Fix MERGE operation not being recorded in QueryExecutionListener when submitted through Scala/Python API.
Support RESTORE on a Delta table with clustering enabled
Fix replacing clustered table with non-clustered table.
Fix an issue when running clustering on table with single column selected as clustering columns.

Delta Universal Format (UniForm)

Documentation: https://docs.delta.io/3.2.1/delta-uniform.html
RC3 artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13

The key changes of this release are:

Added the support to enable Uniform Iceberg on existing Delta tables by ALTER table instead of REORG, which rewrites data files.
Fixed a bug that Uniform iceberg conversion transaction should not convert commit with only AddFiles without data change

Delta Sharing Spark

Documentation: https://docs.delta.io/3.2.1/delta-sharing.html
RC3 artifacts: delta-sharing-spark_2.12, delta-sharing-spark_2.13

The key changes of this release are:

Upgrade delta-sharing-client to version 1.1.1 which removes the pre-signed URL address from the error message on access errors.
Fix an issue with DeltaSharingLogFileStatus

Delta Kernel

API documentation: https://docs.delta.io/3.2.1/delta-kernel.html
RC3 artifacts: delta-kernel-api, delta-kernel-defaults

The key changes of this release are:

Fix comparison issues with string values having characters with surrogate pairs. This fixes a corner case with wrong results when comparing characters (e.g. emojis) that have surrogate pairs in UTF-16 representation.
Fix ClassNotFoundException issue when loading LogStores in Kernel default Engine module. This issue happens in some environments where the thread local class loader is not set.
Fix error when querying tables with spaces in the path name. Now you can query tables with paths having any valid path characters.
Fix an issue with writing decimal as binary when writing decimals with certain scale and precision when writing them to the Parquet file.
Throw proper exception when unsupported VOID data type is encountered in Delta tables when reading.
Handle long type values in field metadata of columns in schema. Earlier Kernel was throwing a parsing exception, now Kernel handles long types.
Fix an issue where Kernel retries multiple times when _last_checkpoint file is not found. Now Kernel tries just once when file not found exception is thrown.
Support reading Parquet files with legacy map type physical formats. Earlier Kernel used to throw errors, now Kernel can read data from file containing legacy map physical formats.
Support reading Parquet files with legacy 3-level repeated type physical formats.
Write timestamp data to Parquet file as INT64 physical format instead of INT96 physical format. INT96 is a legacy physical format that is deprecated.

For more information, refer to:

User guide on step-by-step process of using Kernel in a standalone Java program or in a distributed processing connector.
Slides explaining the rationale behind Kernel and the API design.
Example Java programs that illustrate how to read Delta tables using the Kernel APIs.
Table and default Engine API Java documentation

Delta Standalone (deprecated in favor of Delta Kernel)

API documentation: https://docs.delta.io/3.2.1/delta-standalone.html
RC3 artifacts:delta-standalone_2.12, delta-standalone_2.13

There is no update to Standalone in this release. Standalone is being deprecated in favor of Delta Kernel, which supports advanced features in Delta tables.

Delta Storage

RC3 artifacts: delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Fix an issue with VACUUM when using the S3DynamoDBLogStore where the LogStore made unnecessary listFrom calls to DynamoDB, causing a ProvisionedThroughputExceededException

How to use this Release Candidate [RC only]

Download Spark 3.5 from https://spark.apache.org/downloads.html.

For this release candidate, we have published the artifacts to a staging repository. Here’s how you can use them:

Spark Submit

Add --repositories https://oss.sonatype.org/content/repositories/iodelta-1168 to the command line arguments.
Example:

spark-submit --packages io.delta:delta-spark_2.12:3.2.1 --repositories https://oss.sonatype.org/content/repositories/iodelta-1168 examples/examples.py

Currently Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.2.1 by just providing the --packages io.delta:delta-spark_2.12:3.2.1 argument.

Spark Shell

bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.1 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1168 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Spark SQL

bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.1 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1168 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Maven

<repositories>
  <repository>
    <id>staging-repo</id>
    <url>https://oss.sonatype.org/content/repositories/iodelta-1168</url>
  </repository>
</repositories>
<dependency>
  <groupId>io.delta</groupId>
  <artifactId>delta-spark_2.12</artifactId>
  <version>3.2.1</version>
</dependency>

SBT Project

libraryDependencies += "io.delta" %% "delta-spark" % "3.2.1"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1...

Assets 4

11 Sep 01:20

vkorukanti

v3.2.1rc2

4e71aee

Delta Lake 3.2.1 RC2 Pre-release

Pre-release

We are excited to announce the release of Delta Lake 3.2.1 RC2! This release contains important bug fixes to 3.2.1 and it is recommended that users update to 3.2.1. Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.

Details by each component.

Delta Spark

Delta Spark 3.2.1 is built on Apache Spark™ 3.5.2. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Documentation: https://docs.delta.io/3.2.1/index.html
API documentation: https://docs.delta.io/3.2.1/delta-apidoc.html#delta-spark
RC2 artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Support for Apache Spark™ 3.5.2.
Fix MERGE operation not being recorded in QueryExecutionListener when submitted through Scala/Python API.
Support RESTORE on a Delta table with clustering enabled
Fix replacing clustered table with non-clustered table.
Fix an issue when running clustering on table with single column selected as clustering columns.

Delta Universal Format (UniForm)

Documentation: https://docs.delta.io/3.2.1/delta-uniform.html
RC2 artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13

The key changes of this release are:

Added the support to enable Uniform Iceberg on existing Delta tables by ALTER table instead of REORG, which rewrites data files.
Fixed a bug that Uniform iceberg conversion transaction should not convert commit with only AddFiles without data change

Delta Sharing Spark

Documentation: https://docs.delta.io/3.2.1/delta-sharing.html
RC2 artifacts: delta-sharing-spark_2.12, delta-sharing-spark_2.13

The key changes of this release are:

Upgrade delta-sharing-client to version 1.1.1 which removes the pre-signed URL address from the error message on access errors.
Fix an issue with DeltaSharingLogFileStatus

Delta Kernel

API documentation: https://docs.delta.io/3.2.1/delta-kernel.html
RC2 artifacts: delta-kernel-api, delta-kernel-defaults

The key changes of this release are:

Fix comparison issues with string values having characters with surrogate pairs. This fixes a corner case with wrong results when comparing characters (e.g. emojis) that have surrogate pairs in UTF-16 representation.
Fix ClassNotFoundException issue when loading LogStores in Kernel default Engine module. This issue happens in some environments where the thread local class loader is not set.
Fix error when querying tables with spaces in the path name. Now you can query tables with paths having any valid path characters.
Fix an issue with writing decimal as binary when writing decimals with certain scale and precision when writing them to the Parquet file.
Throw proper exception when unsupported VOID data type is encountered in Delta tables when reading.
Handle long type values in field metadata of columns in schema. Earlier Kernel was throwing a parsing exception, now Kernel handles long types.
Fix an issue where Kernel retries multiple times when _last_checkpoint file is not found. Now Kernel tries just once when file not found exception is thrown.
Support reading Parquet files with legacy map type physical formats. Earlier Kernel used to throw errors, now Kernel can read data from file containing legacy map physical formats.
Support reading Parquet files with legacy 3-level repeated type physical formats.
Write timestamp data to Parquet file as INT64 physical format instead of INT96 physical format. INT96 is a legacy physical format that is deprecated.

For more information, refer to:

User guide on step-by-step process of using Kernel in a standalone Java program or in a distributed processing connector.
Slides explaining the rationale behind Kernel and the API design.
Example Java programs that illustrate how to read Delta tables using the Kernel APIs.
Table and default Engine API Java documentation

Delta Standalone (deprecated in favor of Delta Kernel)

API documentation: https://docs.delta.io/3.2.1/delta-standalone.html
RC2 artifacts:delta-standalone_2.12, delta-standalone_2.13

There is no update to Standalone in this release. Standalone is being deprecated in favor of Delta Kernel, which supports advanced features in Delta tables.

Delta Storage

RC2 artifacts: delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Fix an issue with VACUUM when using the S3DynamoDBLogStore where the LogStore made unnecessary listFrom calls to DynamoDB, causing a ProvisionedThroughputExceededException

How to use this Release Candidate [RC only]

Download Spark 3.5 from https://spark.apache.org/downloads.html.

For this release candidate, we have published the artifacts to a staging repository. Here’s how you can use them:

Spark Submit

Add --repositories https://oss.sonatype.org/content/repositories/iodelta-1167 to the command line arguments.
Example:

spark-submit --packages io.delta:delta-spark_2.12:3.2.1 --repositories https://oss.sonatype.org/content/repositories/iodelta-1167 examples/examples.py

Currently Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.2.1 by just providing the --packages io.delta:delta-spark_2.12:3.2.1 argument.

Spark Shell

bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.1 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1167 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Spark SQL

bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.1 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1167 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Maven

<repositories>
  <repository>
    <id>staging-repo</id>
    <url>https://oss.sonatype.org/content/repositories/iodelta-1167</url>
  </repository>
</repositories>
<dependency>
  <groupId>io.delta</groupId>
  <artifactId>delta-spark_2.12</artifactId>
  <version>3.2.1</version>
</dependency>

SBT Project

libraryDependencies += "io.delta" %% "delta-spark" % "3.2.1"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1...

Assets 4

04 Sep 16:48

vkorukanti

v3.2.1rc1

8c81984

Delta Lake 3.2.1 RC1 Pre-release

Pre-release

We are excited to announce the release of Delta Lake 3.2.1 RC1! This release contains important bug fixes to 3.2.1 and it is recommended that users update to 3.2.1. Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.

Details by each component.

Delta Spark

Delta Spark 3.2.1 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Documentation: https://docs.delta.io/3.2.1/index.html
API documentation: https://docs.delta.io/3.2.1/delta-apidoc.html#delta-spark
RC1 artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Support for Apache Spark 3.5.2.
Support QueryExecutionListener for MERGE queries submitted through Scala API.
#3474
Support RESTORE on a Delta table with clustering enabled
Fix replacing clustered table with non-clustered table.
Fix an issue when running clustering on table with single column selected as clustering columns.

Delta Universal Format (UniForm)

Documentation: https://docs.delta.io/3.2.1/delta-uniform.html
RC1 artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13

The key changes of this release are:

Uniform iceberg conversion transaction should not convert commit with only AddFiles without datachange

Delta Sharing Spark

Documentation: https://docs.delta.io/3.2.1/delta-sharing.html
RC1 artifacts: delta-sharing-spark_2.12, delta-sharing-spark_2.13

The key changes of this release are:
Upgrade delta-sharing-client to version 1.1.1 which removes the pre-signed url address from the error message on access errors.
Fix an issue with DeltaSharingLogFileStatus

Delta Kernel

API documentation: https://docs.delta.io/3.2.1/delta-kernel.html
RC1 artifacts: delta-kernel-api, delta-kernel-defaults

The key changes of this release are:

Fix comparison issues with string values having characters with surrogate pairs. This fixes a corner case when comparing characters (e.g. emojis) that have surrogate pairs in UTF-16 representation.
Fix ClassNotFoundException issue when loading LogStores in Kernel default Engine module. This issue happens in some environments where the thread local class loader is not set.
Fix error when querying tables with spaces in the path name. Now you can query tables with paths having any valid path characters.
Fix an issue with writing decimal as binary when writing decimals with certain scale and precision when writing them to the Parquet file.
Throw proper exception when unsupported VOID data type is encountered in Delta tables when reading.
Handle long type values in field metadata of columns in schema. Earlier Kernel was throwing a parsing exception, now Kernel handles long types.
Fix an issue where Kernel retries multiple times when _last_checkpoint file is not found. Now Kernel tries just once when file not found exception is thrown.
Support reading Parquet files with legacy map type physical formats. Earlier Kernel used to throw errors, now Kernel can read data from file containing legacy map physical formats.
Support reading Parquet files with legacy 3-level repeated type physical formats.
Write timestamp data to Parquet file as INT64 physical format instead of INT96 physical format. INT96 is a legacy physical format that is deprecated.

For more information, refer to:

User guide on step-by-step process of using Kernel in a standalone Java program or in a distributed processing connector.
Slides explaining the rationale behind Kernel and the API design.
Example Java programs that illustrate how to read Delta tables using the Kernel APIs.
Table and default Engine API Java documentation

Delta Standalone (deprecated in favor of Delta Kernel)

API documentation: https://docs.delta.io/3.2.1/delta-standalone.html
RC1 artifacts:delta-standalone_2.12, delta-standalone_2.13

No update to Standalone in this release. Standalone is being deprecated in favor of Delta Kernel which supports advanced features in Delta tables.

Delta Storage

RC1 artifacts: delta-storage, delta-storage-s3-dynamodb

The key changes of this release are:

Fix an issue with VACUUM when using the S3DynamoDBLogStore where the LogStore made unnecessary listFrom calls to DynamoDB, causing a ProvisionedThroughputExceededException

How to use this Release Candidate [RC only]

Download Spark 3.5 from https://spark.apache.org/downloads.html.

For this release candidate, we have published the artifacts to a staging repository. Here’s how you can use them:

Spark Submit

Add --repositories https://oss.sonatype.org/content/repositories/iodelta-1166 to the command line arguments.
Example:

spark-submit --packages io.delta:delta-spark_2.12:3.2.1 --repositories https://oss.sonatype.org/content/repositories/iodelta-1166 examples/examples.py

Currently Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.2.1 by just providing the --packages io.delta:delta-spark_2.12:3.2.1 argument.

Spark Shell

bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.1 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1166 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Spark SQL

bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.1 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1166 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Maven

<repositories>
  <repository>
    <id>staging-repo</id>
    <url>https://oss.sonatype.org/content/repositories/iodelta-1166</url>
  </repository>
</repositories>
<dependency>
  <groupId>io.delta</groupId>
  <artifactId>delta-spark_2.12</artifactId>
  <version>3.2.1</version>
</dependency>

SBT Project

libraryDependencies += "io.delta" %% "delta-spark" % "3.2.1"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1166

(PySpark) Delta-Spark

Download two artifacts from pre-release: https://github.com/delta-io/delta/releases/tag/v3.2.1rc1
Artifacts to download are:
- delta-spark-3.2.1.tar.gz
- delta_spark-3.2.1-py3...

Assets 4

13 Jun 16:28

vkorukanti

v4.0.0rc1

6c81c59

Delta Lake 4.0.0 Preview Pre-release

Pre-release

We are excited to announce the preview release of Delta Lake 4.0.0 on the preview release of Apache Spark 4.0.0! This release gives a preview of the following exciting new features.

Support for Spark Connect (aka Delta Connect) is an extension for Spark Connect which enables the usage of Delta over Spark Connect, allowing Delta to be used with the decoupled client-server architecture of Spark Connect.
Support for Type Widening to allow users to change the type of columns without having to rewrite data.
Support for the Variant data type to enable semi-structured storage and data processing, for flexibility and performance.
Support for Coordinated Commits table feature which makes the commit protocol very flexible and allows reliable multi-cloud and multi-engine writes.

Read below for more details. In addition, few existing artifacts are unavailable in this release that are listed at the end.

Delta Spark

Delta Spark 4.0 preview is built on Apache Spark™ 4.0.0-preview1. Similar to Apache Spark, we have released Maven artifacts for Scala 2.13.

Documentation: https://docs.delta.io/4.0.0-preview/index.html
Maven artifacts: delta-spark_2.13, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
Python artifacts: https://pypi.org/project/delta-spark/4.0.0rc1/

The key features of this release are:

Support for Spark Connect (aka Delta Connect): Spark Connect is a new initiative in Apache Spark that adds a decoupled client-server infrastructure which allows Spark applications to connect remotely to a Spark server and run SQL / Dataframe operations. Delta Connect allows Delta operations to be made in applications running in such client-server mode. For more information on how to use Delta Connect see the Delta Connect documentation.
Support for Coordinated Commits: Coordinated Commits is a new writer table feature which allows users to designate a “Commit Coordinator” for their Delta table. A commit coordinator is an entity with a unique identifier which maintains information about commits. Once a commit coordinator has been set for a table, all writes to the table must be coordinated through it. This single point of ownership of commits for the table makes cross-environment (e.g. cross cloud) writes safe. Examples of Commit Coordinators are catalogs (Hive Metastore, Unity Catalog, etc.), DynamoDB, or any system which can implement the commit coordinator API. This release also adds a DynamoDB Commit Coordinator which can use a DynamoDB table to coordinate commits for a table. Delta tables with commit coordinators are still readable through the object storage paths, making reads backward compatible. See the Delta Coordinated Commits documentation for more details.
Support for Type Widening: Delta Spark can now change the type of a column to a wider type using the ALTER TABLE t CHANGE COLUMN col TYPE type command or with schema evolution during MERGE and INSERT operations. See the type widening documentation for a list of all supported type changes and additional information. The table will be readable by Delta 4.0 readers without requiring the data to be rewritten. For compatibility with older versions, a rewrite of the data can be triggered using the ALTER TABLE t DROP FEATURE 'typeWidening' command.
Support for Variant data type: The Variant data type is a new Apache Spark data type. The Variant data type enables flexible, and efficient processing of semi-structured data, without a user-specified schema. Variant data does not require a fixed schema on write. Instead, Variant data is queried using the schema-on-read approach. The Variant data type allows flexible ingestion by not requiring a write schema, and enables faster processing with the Spark Variant binary encoding format. Please see the documentation and the example for more details.

Other notable changes include:

Support protocol version downgrades when the existing table features exist in the lower protocol version.
Support dropping table features for columnMapping and vacuumProtocolCheck.
Support CREATE TABLE LIKE with user provided properties. Previously any properties that were provided in the SQL command were ignored and only the properties from the source table were used.
Fix liquid clustering to automatically fall back to Z-order clustering when clustering on a single column. Previously, any attempts to optimize the table would fail.
Pushdown query filters when reading CDF so the filters can be used for partition pruning and row group skipping.
Improve the performance of finding the last complete checkpoint with more efficient file listing.
Fix a bug where providing a query filter that compares two Literal expressions would cause an infinite loop when constructing data skipping filters.
Fix In-Commit Timestamps to use clock.currentTimeMillis() instead of System.nanoTime() for large commits since some systems return a very small number when System.nanoTime() is called.
Fix streaming CDF queries to not read log entries beyond endOffset for reduced processing time.

More features to come in the final release of Delta 4.0!

Delta Kernel Java

Maven artifacts: delta-kernel-api, delta-kernel-defaults

The Delta Kernel project is a set of Java and Rust libraries for building Delta connectors that can read and write to Delta tables without the need to understand the Delta protocol details.

This release of Delta Kernel Java contains the following changes:

Write timestamps using the INT64 physical format in Parquet in the DefaultParquetHandler. Previously they were written as INT96 which is an outdated and deprecated format for timestamps.
Lazily evaluate comparator expressions in the DefaultExpressionHandler. Previously expressions would be eagerly evaluated for every row in the underlying vectors.
Support SQL expression LIKE in the DefaultExpressionHandler.
Support legacy Parquet schemas for map type and array type in the DefaultParquetHandler.

In addition to the above Delta Kernel Java changes, Delta Kernel Rust released its first version 0.1, which is available at https://crates.io/crates/delta_kernel.

Limitations

The following features from Delta 3.2 are not supported in this preview release. We are working with the community to address the following gaps by the final release of Delta 4.0:

In Delta Spark, Uniform with Iceberg and Hudi is unavailable yet due to lack of their support for Spark 4.0.
Delta Flink, Delta Standalone, and Delta Hive are not available yet.

Credits

Abhishek Radhakrishnan, Allison Portis, Ami Oka, Andreas Chatzistergiou, Anish, Carmen Kwan, Chirag Singh, Christos Stavrakakis, Dhruv Arya, Felipe Pessoto, Fred Storage Liu, Hyukjin Kwon, James DeLoye, Jiaheng Tang, Johan Lasperas, Jun, Kaiqi Jin, Krishnan Paranji Ravi, Lin Zhou, Lukas Rupprecht, Ole Sasse, Paddy Xu, Prakhar Jain, Qianru Lao, Richard Chen, Sabir Akhadov, Scott Sandre, Sergiu Pocol, Sumeet Varma, Tai Le Manh, Tathagata Das, Thang Long Vu, Tom van Bussel,...

Assets 2

09 May 19:55

scottsand-db

v3.2.0

4e7a342

Delta Lake 3.2.0

We are excited to announce the release of Delta Lake 3.2.0! This release includes several exciting new features.

Highlights

Support for Liquid clustering to reduce write amplification using incremental clustering.
Preview support for Type Widening to allow users to change the type of columns without having to rewrite data.
Preview support for Apache Hudi in Delta UniForm tables.

Delta Spark

Delta Spark 3.2.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Documentation: https://docs.delta.io/3.2.0/index.html
API documentation: https://docs.delta.io/3.2.0/delta-apidoc.html#delta-spark
Maven artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb, delta-iceberg_2.12, delta-iceberg_2.13
Python artifacts: https://pypi.org/project/delta-spark/3.2.0/

The key features of this release are:

Support for Liquid clustering: This allows for incremental clustering based on ZCubes and reduces the write amplification by not touching files already well clustered (i.e., files in stable ZCubes). Users can now use the ALTER TABLE CLUSTER BY syntax to change clustering columns and use the DESCRIBE DETAIL command to check the clustering columns. In addition, Delta Spark now supports DeltaTable clusterBy API in both Python and Scala to allow creating clustered tables using DeltaTable API. See the documentation and examples for more information.
Preview support for Type Widening: Delta Spark can now change the type of a column from byte to short to integer using the ALTER TABLE t CHANGE COLUMN col TYPE type command or with schema evolution during MERGE and INSERT operations. The table remains readable by Delta 3.2 readers without requiring the data to be rewritten. For compatibility with older versions, a rewrite of the data can be triggered using the ALTER TABLE t DROP FEATURE 'typeWidening-preview’ command.
- Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
Support for Vacuum Inventory: Delta Spark now extends the VACUUM SQL command to allow users to specify an inventory table in a VACUUM command. When an inventory table is provided, VACUUM will consider the files listed there instead of doing the full listing of the table directory, which can be time consuming for very large tables. See the docs here.
Support for Vacuum Writer Protocol Check: Delta Spark can now support vacuumProtocolCheck ReaderWriter feature which ensures consistent application of reader and writer protocol checks during VACUUM operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks.
Preview support for In-Commit Timestamps: When enabled, this preview feature persists monotonically increasing timestamps within Delta commits, ensuring they are not affected by file operations. When enabled, time travel queries will yield consistent results, even if the table directory is relocated.
- Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
Deletion Vectors Read Performance Improvements: Two improvements were introduced to DVs in Delta 3.2.
- Removing broadcasting of DV information to executors: This work improves stability by reducing drivers’ memory consumption, preventing potential Driver OOM for very large Delta tables like 1TB+. This work also improves performance by saving us fixed broadcasting overhead in reading small Delta Tables.
- Supporting predicate pushdown and splitting in scans with DVs: Improving performance of DV reads with filters queries thanks to predicate pushdown and splitting. This feature gains 2x performance improvement on average.
Support for Row Tracking: Delta Spark can now write to tables that maintain information that allows identifying rows across multiple versions of a Delta table. Delta Spark can now also access this tracking information using the two metadata fields _metadata.row_id and _metadata.row_commit_version.

Other notable changes include:

Delta Sharing: reduce the minimum RPC interval in delta sharing streaming from 30 seconds to 10 seconds
Improve the performance of write operations by skipping collecting commit stats
New SQL configurations to specify Delta Log cache size (spark.databricks.delta.delta.log.cacheSize) and retention duration (spark.databricks.delta.delta.log.cacheRetentionMinutes)
Fix bug in plan validation due to inconsistent field metadata in MERGE
Improved metrics during VACUUM for better visibility
Hive Metastore schema sync: The truncation threshold for schemas with long fields is now user configurable

Delta Universal Format (UniForm)

Documentation: https://docs.delta.io/3.2.0/delta-uniform.html
Maven artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi_2.13

Hudi is now supported by Delta Universal format in addition to Iceberg. Writing to a Delta UniForm table can generate Hudi metadata, alongside Delta. This feature is contributed by XTable.

Create a UniForm-enabled that automatically generates Hudi metadata using the following command:

CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES ('delta.universalFormat.enabledFormats' = hudi);

See the documentation here for more details.

Other notable changes include:

Throw a better error if Iceberg conversion fails during initial sync
Fix a bug in Delta Universal Format to support correct table overwrites

Delta Kernel

API documentation: https://docs.delta.io/3.2.0/api/java/kernel/index.html
Maven artifacts: delta-kernel-api, delta-kernel-defaults

The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the Delta protocol details). In this release,e we improved the read support to make it production-ready by adding numerous performance improvements, additional functionality, and improved protocol support.

Support for time travel. Now you can read a table snapshot at a version id or snapshot at a timestamp.
Improved Delta protocol support.
- Support for reading tab...

Assets 4

06 May 23:42

scottsand-db

v3.2.0rc2

4e7a342

Delta Lake 3.2.0 (RC2) Pre-release

Pre-release

We are excited to announce the release of Delta Lake 3.2.0 (RC2)! Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.

Highlights

Support for Liquid clustering to reduce write amplification using incremental clustering.
Preview support for Type Widening to allow users to change the type of columns without having to rewrite data.
Preview support for Apache Hudi in Delta UniForm tables.

Delta Spark

Delta Spark 3.2.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Documentation: https://docs.delta.io/3.2.0/index.html
API documentation: https://docs.delta.io/3.2.0/delta-apidoc.html#delta-spark
RC2 artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
RC2 Python artifacts: See delta-3.2-rc2-python-artifacts.zip attached

The key features of this release are:

Support for Liquid clustering: This allows for incremental clustering based on ZCubes and reduces the write amplification by not touching files already well clustered (i.e., files in stable ZCubes). Users can now use the ALTER TABLE CLUSTER BY syntax to change clustering columns and use the DESCRIBE DETAIL command to check the clustering columns. In addition, Delta Spark now supports DeltaTable clusterBy API in both Python and Scala to allow creating clustered tables using DeltaTable API. See the documentation and examples for more information.
Preview support for Type Widening: Delta Spark can now change the type of a column from byte to short to integer using the ALTER TABLE t CHANGE COLUMN col TYPE type command or with schema evolution during MERGE and INSERT operations. The table remains readable by Delta 3.2 readers without requiring the data to be rewritten. For compatibility with older versions, a rewrite of the data can be triggered using the ALTER TABLE t DROP FEATURE 'typeWidening-preview’ command.
- Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
Support for Vacuum Inventory: Delta Spark now extends the VACUUM SQL command to allow users to specify an inventory table in a VACUUM command. When an inventory table is provided, VACUUM will consider the files listed there instead of doing the full listing of the table directory, which can be time consuming for very large tables. See the docs here.
Support for Vacuum Writer Protocol Check: Delta Spark can now support vacuumProtocolCheck ReaderWriter feature which ensures consistent application of reader and writer protocol checks during VACUUM operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks.
Preview support for In-Commit Timestamps: When enabled, this preview feature persists monotonically increasing timestamps within Delta commits, ensuring they are not affected by file operations. When enabled, time travel queries will yield consistent results, even if the table directory is relocated.
- Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
Deletion Vectors Read Performance Improvements: Two improvements were introduced to DVs in Delta 3.2.
- Removing broadcasting of DV information to executors: This work improves stability by reducing drivers’ memory consumption, preventing potential Driver OOM for very large Delta tables like 1TB+. This work also improves performance by saving us fixed broadcasting overhead in reading small Delta Tables.
- Supporting predicate pushdown and splitting in scans with DVs: Improving performance of DV reads with filters queries thanks to predicate pushdown and splitting. This feature gains 2x performance improvement on average.
Support for Row Tracking: Delta Spark can now write to tables that maintain information that allows identifying rows across multiple versions of a Delta table. Delta Spark can now also access this tracking information using the two metadata fields _metadata.row_id and _metadata.row_commit_version.

Other notable changes include:

Delta Sharing: reduce the minimum RPC interval in delta sharing streaming from 30 seconds to 10 seconds
Improve the performance of write operations by skipping collecting commit stats
New SQL configurations to specify Delta Log cache size (spark.databricks.delta.delta.log.cacheSize) and retention duration (spark.databricks.delta.delta.log.cacheRetentionMinutes)
Fix bug in plan validation due to inconsistent field metadata in MERGE
Improved metrics during VACUUM for better visibility
Hive Metastore schema sync: The truncation threshold for schemas with long fields is now user configurable

Delta Universal Format (UniForm)

Documentation: https://docs.delta.io/3.2.0/delta-uniform.html
RC2 artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13

Hudi is now supported by Delta Universal format in addition to Iceberg. Writing to a Delta UniForm table can generate Hudi metadata, alongside Delta. This feature is contributed by XTable.

Create a UniForm-enabled that automatically generates Hudi metadata using the following command:

CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES ('delta.universalFormat.enabledFormats' = hudi);

See the documentation here for more details.

Other notable changes include:

Throw a better error if Iceberg conversion fails during initial sync
Fix a bug in Delta Universal Format to support correct table overwrites

Delta Kernel

API documentation: https://docs.delta.io/3.2.0/api/java/kernel/index.html
RC2 artifacts: delta-kernel-api, delta-kernel-defaults

Support for time travel. Now you can read a ...

Assets 2

29 Apr 23:06

scottsand-db

v3.2.0rc1

09ff609

Delta Lake 3.2.0 (RC1) Pre-release

Pre-release

We are excited to announce the release of Delta Lake 3.2.0 (RC1)! Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.

Highlights

Support for Liquid clustering to reduce write amplification using incremental clustering.
Preview support for Type Widening to allow users to change the type of columns without having to rewrite data.
Preview support for Apache Hudi in Delta UniForm tables.

Delta Spark

Delta Spark 3.2.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Documentation: https://docs.delta.io/3.2.0/index.html
API documentation: https://docs.delta.io/3.2.0/delta-apidoc.html#delta-spark
RC1 artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
RC1 Python artifacts: See delta-3.2-rc1-python-artifacts.zip attached

The key features of this release are:

Support for Liquid clustering: This allows for incremental clustering based on ZCubes and reduces the write amplification by not touching files already well clustered (i.e., files in stable ZCubes). Users can now use the ALTER TABLE CLUSTER BY syntax to change clustering columns and use the DESCRIBE DETAIL command to check the clustering columns. In addition, Delta Spark now supports DeltaTable clusterBy API in both Python and Scala to allow creating clustered tables using DeltaTable API. See the documentation and examples for more information.
Preview support for Type Widening: Delta Spark can now change the type of a column from byte to short to integer using the ALTER TABLE t CHANGE COLUMN col TYPE type command or with schema evolution during MERGE and INSERT operations. The table remains readable by Delta 3.2 readers without requiring the data to be rewritten. For compatibility with older versions, a rewrite of the data can be triggered using the ALTER TABLE t DROP FEATURE 'typeWidening-preview’ command.
- Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
Support for Vacuum Inventory: Delta Spark now extends the VACUUM SQL command to allow users to specify an inventory table in a VACUUM command. When an inventory table is provided, VACUUM will consider the files listed there instead of doing the full listing of the table directory, which can be time consuming for very large tables. See the docs here.
Support for Vacuum Writer Protocol Check: Delta Spark can now support vacuumProtocolCheck ReaderWriter feature which ensures consistent application of reader and writer protocol checks during VACUUM operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks.
Preview support for In-Commit Timestamps: When enabled, this preview feature persists monotonically increasing timestamps within Delta commits, ensuring they are not affected by file operations. When enabled, time travel queries will yield consistent results, even if the table directory is relocated.
- Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
Deletion Vectors Read Performance Improvements: Two improvements were introduced to DVs in Delta 3.2.
- Removing broadcasting of DV information to executors: This work improves stability by reducing drivers’ memory consumption, preventing potential Driver OOM for very large Delta tables like 1TB+. This work also improves performance by saving us fixed broadcasting overhead in reading small Delta Tables.
- Supporting predicate pushdown and splitting in scans with DVs: Improving performance of DV reads with filters queries thanks to predicate pushdown and splitting. This feature gains 2x performance improvement on average.
Support for Row Tracking: Delta Spark can now write to tables that maintain information that allows identifying rows across multiple versions of a Delta table. Delta Spark can now also access this tracking information using the two metadata fields _metadata.row_id and _metadata.row_commit_version.

Other notable changes include:

Delta Sharing: reduce the minimum RPC interval in delta sharing streaming from 30 seconds to 10 seconds
Improve the performance of write operations by skipping collecting commit stats
New SQL configurations to specify Delta Log cache size (spark.databricks.delta.delta.log.cacheSize) and retention duration (spark.databricks.delta.delta.log.cacheRetentionMinutes)
Fix bug in plan validation due to inconsistent field metadata in MERGE
Improved metrics during VACUUM for better visibility
Hive Metastore schema sync: The truncation threshold for schemas with long fields is now user configurable

Delta Universal Format (UniForm)

Documentation: https://docs.delta.io/3.2.0/delta-uniform.html
[RC only] RC1 artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13

Hudi is now supported in by Delta Universal format in addition to Iceberg. Writing to a Delta UniForm table can generate Hudi metadata, alongside Delta. This feature is contributed by XTable.

Create a UniForm-enabled that automatically generates Hudi metadata using the following command:

CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES ('delta.universalFormat.enabledFormats' = hudi);

See the documentation here for more details.

Other notable changes include:

Throw a better error if Iceberg conversion fails during initial sync
Fix a bug in Delta Universal Format to support correct table overwrites

Delta Kernel

API documentation: https://docs.delta.io/3.2.0/api/java/kernel/index.html
RC1 artifacts: delta-kernel-api, delta-kernel-defaults

Support for time travel. Now y...

Assets 2

Releases: delta-io/delta

Delta Lake 3.3.0 (RC2)

Highlights

Delta Spark

Delta UniForm

Delta Kernel

Artifacts

How to use the Delta Spark Release Candidate

Spark Submit

Spark Shell

Spark SQL

Maven

SBT Project

(PySpark) Delta-Spark

Delta Lake 3.3.0 (RC1)

Highlights

Delta Spark

Delta UniForm

Delta Kernel

Artifacts

How to use the Delta Spark Release Candidate

Spark Submit

Spark Shell

Spark SQL

Maven

SBT Project

(PySpark) Delta-Spark

Delta Lake 3.2.1

Delta Spark

Delta Universal Format (UniForm)

Delta Sharing Spark

Delta Kernel

Delta Standalone (deprecated in favor of Delta Kernel)

Delta Storage

Credits

Delta Lake 3.2.1 RC3

Delta Spark

Delta Universal Format (UniForm)

Delta Sharing Spark

Delta Kernel

Delta Standalone (deprecated in favor of Delta Kernel)

Delta Storage

How to use this Release Candidate [RC only]

Spark Submit

Spark Shell

Spark SQL

Maven

SBT Project

Delta Lake 3.2.1 RC2

Delta Spark

Delta Universal Format (UniForm)

Delta Sharing Spark

Delta Kernel

Delta Standalone (deprecated in favor of Delta Kernel)

Delta Storage

How to use this Release Candidate [RC only]

Spark Submit

Spark Shell

Spark SQL

Maven

SBT Project

Delta Lake 3.2.1 RC1

Delta Spark

Delta Universal Format (UniForm)

Delta Sharing Spark

Delta Kernel

Delta Standalone (deprecated in favor of Delta Kernel)

Delta Storage

How to use this Release Candidate [RC only]

Spark Submit

Spark Shell

Spark SQL

Maven

SBT Project

(PySpark) Delta-Spark

Delta Lake 4.0.0 Preview

Delta Spark

Delta Kernel Java

Limitations

Credits