Delta Lake 3.3.0 (RC2)
Pre-releaseWe are excited to announce the release of Delta Lake 3.3.0 RC2! Instructions for how to use this release candidate are at the end of these notes. To give feedback on this release candidate, please post in the Delta Users Slack here or create issues in our Delta repository.
Highlights
Delta Spark
- Support for Identity Column to assign unique values for each record inserted into a table.
- Support VACUUM LITE to deliver faster VACUUM for periodically run VACUUM commands.
- Support for Row Tracking Backfill to alter an existing table to enable Row Tracking. Row Tracking allows engines such as Spark to track row-level lineage in Delta Lake tables.
- Support for enhanced table state validation with version checksums and improved Snapshot initialization performance based on this checksum.
Delta UniForm
- Support for enabling UniForm Iceberg on existing tables without rewriting the data files using
ALTER TABLE
.
Delta Kernel
- Support for reading Delta tables that have Type Widening enabled.
More detailed release notes on these exciting features as well as the other changes included in this release coming soon!
Artifacts
We have published the artifacts to a staging repository.
- Delta Spark artifacts: delta-spark_2.12, delta-spark_2.13, delta-contribs_2.12, delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
- Delta UniForm artifacts: delta-iceberg_2.12, delta-iceberg_2.13, delta-hudi_2.12, delta-hudi-2.13
- Delta Kernel artifacts: delta-kernel-api, delta-kernel-defaults
How to use the Delta Spark Release Candidate
Download Spark 3.5 from https://spark.apache.org/downloads.html.
For this release candidate, we have published the artifacts to a staging repository. Here’s how you can use them:
Spark Submit
- Add
--repositories https://oss.sonatype.org/content/repositories/iodelta-1181
to the command line arguments. - Example:
spark-submit --packages io.delta:delta-spark_2.12:3.3.0 --repositories https://oss.sonatype.org/content/repositories/iodelta-1181 examples/examples.py
Currently Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.3.0 by just providing the --packages io.delta:delta-spark_2.12:3.3.0
argument.
Spark Shell
bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1181 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
Spark SQL
bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1181 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
Maven
<repositories>
<repository>
<id>staging-repo</id>
<url>https://oss.sonatype.org/content/repositories/iodelta-1181</url>
</repository>
</repositories>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-spark_2.12</artifactId>
<version>3.3.0</version>
</dependency>
SBT Project
libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1181
(PySpark) Delta-Spark
- Download two artifacts from pre-release: https://github.com/delta-io/delta/releases/tag/v3.3.0rc2
- Artifacts to download are:
- delta-spark-3.3.0.tar.gz
- delta_spark-3.3.0-py3-none-any.whl
- Keep them in one directory. Let’s call that
~/Downloads
pip install ~/Downloads/delta_spark-3.3.0-py3-none-any.whl
pip show delta-spark
should show output similar to the below
Name: delta-spark
Version: 3.3.0
Summary: Python APIs for using Delta Lake with Apache Spark
Home-page: https://github.com/delta-io/delta/
Author: The Delta Lake Project Authors
Author-email: [email protected]
License: Apache-2.0
Location: /Users/<user.name>/opt/anaconda3/envs/delta-release-3.3/lib/python3.8/site-packages
Requires: importlib-metadata, pyspark
Required-by: