Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deltalake #1256

Merged
merged 162 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
179dfbc
Create tests for delta-file function
May 28, 2023
d049873
Create tests for delta-file function with transform
May 28, 2023
d5f8f00
Implement delta-file function
May 28, 2023
95d135f
Add delta as output option
May 28, 2023
85c7342
Add delta as output option
May 28, 2023
75d6c3f
Add delta to configs
May 28, 2023
e907de3
Create test data for delta
May 28, 2023
29177c5
Create test to check non-delta-file values and non copied values are …
May 28, 2023
a6e9503
Create error for attempt to modify immutable values
May 28, 2023
38c01f2
Make default mutability level -1
May 28, 2023
a17d345
Adjust error semantics to check for mutability level of -1
May 28, 2023
c083e9a
Adjust test to new immutable modify error semantics
May 28, 2023
24cdb58
Add delta file to alter
May 28, 2023
32e5e01
Create tests for semantics of updating delta files directly
May 28, 2023
27281a6
Add mutabilityLevel getters and setters to itemType hierarchy
May 28, 2023
f95d301
Implement setting of mutabilityLevel for dataframes with itemtypes
May 28, 2023
9f5b847
Apple spotless
May 28, 2023
85fd7d3
Remove err debug
May 31, 2023
0434570
Remove json keyword from updating exprs
Jun 1, 2023
ca0bdd5
Fix copy classnotfound exception when using repl
Jun 1, 2023
0fb6e57
Add this. to solve unqualified acceses
Jun 1, 2023
72ae3a2
Add updating functions lexing
Jun 1, 2023
3eb57d7
Update functiondecl visit to incorporate isUpdating and isExternal bo…
Jun 1, 2023
cc5112c
Update FunctionSignature to include isUpdating flag for StaticContext…
Jun 1, 2023
681d30c
Implement ExpressionClassificationVisitor visits for InlineFunctions …
Jun 1, 2023
dcdfe63
Create simple tests for updating functions
Jun 2, 2023
772727c
Extend function iterators to account for updating functions and imple…
Jun 2, 2023
7a0ebd9
Reorganise visitor list to ensure function calls are set to updating …
Jun 2, 2023
6d10219
Make isUpdating non transient to ensure it retains after deppCopy
Jun 2, 2023
9f80e1d
Include isUpdating to function signatures in InferTypeVisitor
Jun 2, 2023
85e814b
Account for built-in functions when checking signatures
Jun 2, 2023
d51d324
Add error tests for XUST0001, XUST0002, and XUST0028 for updating fun…
Jun 2, 2023
e42ed3a
Add error check for unknown functions in ExpressionClassificationVisitor
Jun 2, 2023
6432b20
Refactor mutabilityLevel to be put inside dataframe and parsed during…
Jun 3, 2023
862ddb8
Remove mutabilityLevel from itemType hierarchy
Jun 3, 2023
77b0ee6
Fix test to correct error code
Jun 3, 2023
78636a8
Add mutabilityLevel to serialisation of ObjectItem and ArrayItem
Jun 3, 2023
9f4b722
Correct errorCode required
Jun 3, 2023
1fe3c25
Add row id and path in dataframe to item hierarchy
Jun 3, 2023
69bd06d
Change rowID to long
Jun 3, 2023
17506e3
Introduce RBDY0007 ErrorCode and refactor expected test outputs
Jun 7, 2023
9e01001
Add getSparkSQLValue and getTableLocation functions to Item interface
Jun 21, 2023
ba0120f
Implement applyItem and applyDelta functions for UpdatePrimitive inte…
Jun 21, 2023
b2fd52d
Make default table location "null" to allow kryo copy
Jun 21, 2023
627c8dd
Alter apply method to use applyItem when table location is "null" too
Jun 21, 2023
a6065b7
Fix bug in query creation to allow for simple updates in delta
Jun 26, 2023
cafd00d
Implement method for getting SQL type of an item in item interface
Jun 27, 2023
d976d99
Fix bugs in applyDelta methods: remove "." and add spaces
Jun 28, 2023
91cc89a
Implement extension to pathIn field for delta table dataframes
Jun 28, 2023
daba332
Add temporary fix to stop copied dataframes being considered as delta
Jun 28, 2023
882af35
Implement getSparkSQLValue func for AnnotatedItem
Jun 28, 2023
74947f9
Fix comma bug in getSparkSQLValue func for ArrayItem
Jun 28, 2023
37acbf8
Fix query string bugs in applyDelta funcs
Jun 28, 2023
9bf9619
Implement dataframe delta table info for object lookups resulting in …
Jun 28, 2023
e70b977
Add getSparkSQLValue function with ItemType param and implement the m…
Jul 1, 2023
13fba96
Add getSparkSQLType func to ItemType interface to account for nullabl…
Jul 1, 2023
ca69a07
Remove originalArray field from dataframes and rework object key loo…
Jul 1, 2023
b403a1b
Create default arrayIndexingApplyDelta method for update primitives t…
Jul 1, 2023
8d3e561
Implement getSparkSQLType for DerivedAtomicItemType
Jul 1, 2023
583cf6a
Implement applyDelta branching for case when pathIn has an array lookup
Jul 1, 2023
1ebfe3e
Fix bug duplicating DeleteFromObject primitives
Jul 1, 2023
7d52f94
Apply spotless
Jul 1, 2023
4582e4e
Implement altering schema in arrays for delta
Jul 7, 2023
04e0520
Fix object lookup bug with placement of "."
Jul 7, 2023
ca5e53c
Create test suite for delta updates
Jul 19, 2023
7ef44e6
Add tests for non nested delta updates
Jul 19, 2023
03599de
Fix bug causing tests to not run
Jul 20, 2023
cc351ab
Add new tests to test nested delta updates
Jul 20, 2023
ab07f5c
Remove unnecessary test
Jul 20, 2023
abc078d
Correct simple tests -- wrong naming
Jul 20, 2023
c4be336
Remove comment
Jul 20, 2023
1f6b419
Add typing to the generation of arrays for SparkSQL from JSONiq to ac…
Jul 20, 2023
1adde98
Refactor checkTableCreation
Jul 20, 2023
74de11b
Add parquet file to test deeply nested updates in delta stores
Jul 20, 2023
b1a706e
Remove unnecessary delta files
Jul 20, 2023
de4d571
Apply spotless
Jul 20, 2023
91a74b6
Replace delta-file for runtime tests
Jul 20, 2023
0077caa
Add missing file
Jul 20, 2023
37a5055
Create tests for R xQuery use case
Jul 24, 2023
4aa46d8
Add CreateTable annotation
Jul 24, 2023
e64f460
Refactor DeltaUpdateRuntimeTests for CreateTable annotation
Jul 24, 2023
585aaf2
Use CreateTable annotation
Jul 24, 2023
ba8b0fe
Create tests for multirow delta updates
Jul 24, 2023
5df8a5a
Fix control flow for updating dataframes in return clause
Jul 24, 2023
c2d02b9
Account for delta table identifications in dfs
Jul 24, 2023
d1f4047
Allow null ItemType in ItemParser
Jul 24, 2023
22e24f5
Refactor Update Primitive Item ordering in PUL to include delta items
Jul 24, 2023
a6ceec7
Ensure Sequence from Updating iterator is empty
Jul 24, 2023
554dc8a
Create dataset for R use case
Jul 24, 2023
d9a2920
Add tests for multirow delta tables and updating funcs
Jul 25, 2023
b52c529
pretty format
Aug 3, 2023
f19193d
Add "this." to solve warning
Aug 3, 2023
05ace8e
Add test to check array insert is at valid index and add said semantics
Aug 9, 2023
46211cf
Make mergeUpdates a non-static method and fix bug regarding not remov…
Aug 9, 2023
a17d934
Fix error message
Aug 9, 2023
3646afa
Create test for replaces before deletes in arrays and fix bug causing…
Aug 10, 2023
9c5c044
Create benchmark jsoniq queries
Aug 17, 2023
7cbdbcc
Create benchmark class
Aug 17, 2023
a320298
Stop Sequences with updating iterator from producing Sequence
Aug 17, 2023
e535acd
Add key exists check to InsertIntoObjectPrimitive
Aug 17, 2023
9f9c414
Ensure set delta id methods propagate to child items of structured items
Aug 17, 2023
4fe6afc
Remove redundant file
Aug 17, 2023
06adf01
Ensure append
Aug 17, 2023
6550d5f
Add metadata
Aug 17, 2023
0baf27a
Remove unused method
Aug 17, 2023
9cd90df
Add tests to check delta ids propagate to child items of structured i…
Aug 17, 2023
a941dbf
Add tests for mutability scoping in return clause of transform
Aug 17, 2023
7e82bc6
Add tests to ensure snapshot isolation
Aug 17, 2023
3387e84
Adapt applyUpdates to use TreeMap to ensure order of array selectors
Aug 17, 2023
94f0864
apply spotless
Aug 17, 2023
ccd9533
fix bug with null pointers
Aug 17, 2023
26c30d6
Set inferred static type of transform to return
Sep 3, 2023
cc04085
Add benchmark data
Sep 3, 2023
d096d71
Refactor transform with delta tests into delta runtime
Sep 3, 2023
a308ea1
Remove comments
Sep 3, 2023
d4101c2
Add test datasets
Sep 3, 2023
a6997c6
Apply spotless
Sep 3, 2023
51ed8d1
Add delta tests to CI
Sep 3, 2023
84629de
Extend pom with delta dependencies
Sep 3, 2023
1edd929
Merge Delta into master.
Jul 22, 2024
17b119e
Fix tests.
Jul 22, 2024
0094d4e
Fix grammar.
Jul 22, 2024
0889cb1
Fix grammar.
Jul 23, 2024
329b75e
Solve dependency conflict on math3.
Jul 23, 2024
866906d
Upload test dataset.
Jul 24, 2024
1ab2e36
Fix syntax.
Jul 24, 2024
7d6cc56
Reorder output in test.
Jul 24, 2024
2cac215
Fix keys() function on DFs.
Jul 24, 2024
7dabc79
Spotless.
Jul 24, 2024
dfa8c7b
Fix test.
Jul 31, 2024
69d968e
Fix mutability.
Jul 31, 2024
acd5235
Remove debug output.
Jul 31, 2024
684d588
Fix test.
Jul 31, 2024
f225718
Fix tests.
Jul 31, 2024
57c0e2f
Make new arrays mutable.
Jul 31, 2024
a87ed88
Fix test.
Jul 31, 2024
ff247cf
Fix test.
Jul 31, 2024
4b51c0d
Fix syntax.
Jul 31, 2024
3eb58a9
Fix test.
Aug 2, 2024
8d324c9
Allow updating without % and define default annotations namespace.
Aug 2, 2024
58c6c0d
Remove an prefix and make default.
Aug 2, 2024
afb8696
Fix 3 tests.
Aug 2, 2024
9750f37
Spotless.
Aug 2, 2024
90e0702
Update metadata.
Aug 2, 2024
55c60f5
Fix test.
Aug 2, 2024
b07be2d
Fix test.
Aug 2, 2024
3ad52a2
Fix test.
Aug 2, 2024
ca9bc04
Add delta tests.
Aug 2, 2024
c3cb597
Fix delta tests.
Aug 2, 2024
22b98b3
Fix tests.
Aug 2, 2024
684dd96
Fix column name to rowID.
Aug 2, 2024
dcd15fe
Revert column names.
Aug 2, 2024
e70d2f9
Run delta tests.
Aug 23, 2024
41053cc
Fix test.
Aug 23, 2024
24ffb46
extend SequenceOfItems with functions to apply PUL and determine if m…
David-C-L Sep 18, 2024
0da26c1
add command line arg for updates and adjust execution flow
David-C-L Sep 18, 2024
7912579
adjust tests containing update tests with cli parameter and SeqOfItem…
David-C-L Sep 18, 2024
ea53993
add isSequential and isUpdating methods to program and StatementsWith…
David-C-L Sep 18, 2024
b02e732
allow treat exprs to be updating
David-C-L Sep 18, 2024
7ee88c3
Merge pull request #1260 from David-C-L/deltalake-scripting-updates
ghislainfourny Sep 20, 2024
4fc8cc6
run mvn spotless
David-C-L Sep 20, 2024
3520ad7
Merge pull request #1261 from David-C-L/deltalake-scripting-updates
ghislainfourny Sep 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
23 changes: 23 additions & 0 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,26 @@ jobs:
- name: MLTestsNativeDeactivated
run: mvn -Dtest=MLTestsNativeDeactivated test

tests4:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up Java 11
uses: actions/setup-java@v3
with:
java-version: 11
distribution: adopt
- name: Cache Maven packages
uses: actions/cache@v3
with:
path: ~/.m2
key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-m2
- name: Install with Maven
run: mvn install -DskipTests -Dgpg.skip --quiet
- name: Compile with Maven
run: mvn clean compile assembly:single
- name: DeltaUpdateRuntimeTests
run: mvn -Dtest=DeltaUpdateRuntimeTests test
9 changes: 7 additions & 2 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,13 @@ NativeFLWORRuntimeTestsParallelismDeactivated:
script:
- mvn -Dtest=NativeFLWORRuntimeTestsParallelismDeactivated test

StaticTypingTest:
stage: tests3
updatedeltaruntime-test:
stage: test
script:
- mvn -Dtest=DeltaUpdateRuntimeTests test

statictyping-test:
stage: test
script:
- mvn -Dtest=StaticTypeTests test

Expand Down
72 changes: 46 additions & 26 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,16 @@
</build>

<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
Expand Down Expand Up @@ -257,36 +267,46 @@
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version>
</dependency>
<dependency>
<dependency>
<groupId>commons-net</groupId>
<artifactId>commons-net</artifactId>
<version>3.1</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
<!--<dependency>
<groupId>edu.vanderbilt.accre</groupId>
<artifactId>laurelin</artifactId>
<version>1.0.1</version>
</dependency>-->
<dependency>
<groupId>org.jgrapht</groupId>
<artifactId>jgrapht-core</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.10.6</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
<version>2.13.4</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
<!--<dependency>
<groupId>edu.vanderbilt.accre</groupId>
<artifactId>laurelin</artifactId>
<version>1.0.1</version>
</dependency>-->
<dependency>
<groupId>org.jgrapht</groupId>
<artifactId>jgrapht-core</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.10.6</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
<version>2.13.4</version>
</dependency>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>

<distributionManagement>
Expand Down
102 changes: 99 additions & 3 deletions src/main/java/org/rumbledb/api/Item.java
Original file line number Diff line number Diff line change
Expand Up @@ -692,17 +692,113 @@ default boolean isNaN() {
* @return an int representing nestedness of the item inside transform expressions.
*/
default int getMutabilityLevel() {
return 0;
return -1;
}

/**
* Sets the mutability level of the item.
* Sets the mutability level of the item to a supplied value.
*
* @param mutabilityLevel the new mutability level.
* @param mutabilityLevel new mutability level.
*/
default void setMutabilityLevel(int mutabilityLevel) {
}

/**
* Returns the top level ID of the item.
*
* @return int representing the rowID of the item within a DeltaFile.
*/
default long getTopLevelID() {
return -1;
}

/**
* Sets the top level ID of the item to a supplied value.
*
* @param topLevelID new top level ID.
*/
default void setTopLevelID(long topLevelID) {
}

/**
* Returns the path from the top level object of a DeltaFile for the item.
*
* @return String representing the path of the item from the top level within a DeltaFile.
*/
default String getPathIn() {
return "null";
}

/**
* Sets the path from the top level object of a DeltaFile for the item to a supplied value.
*
* @param pathIn new path from top level.
*/
default void setPathIn(String pathIn) {
}

/**
* Returns the location of the DeltaFile for the item.
*
* @return String representing the location of the DeltaFile for the item.
*/
default String getTableLocation() {
return null;
}


/**
* Sets the location of the DeltaFile for the item to a supplied value.
*
* @param location new location of the DeltaFile for the item.
*/
default void setTableLocation(String location) {
}

/**
* Returns the SparkSQL value of the item for use in a query.
*
* @return String representing the SparkSQL value of the item.
*/
default String getSparkSQLValue() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

/**
* Returns the SparkSQL value of the item for use in a query.
*
* @return String representing the SparkSQL value of the item.
*/
default String getSparkSQLValue(ItemType itemType) {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

/**
* Returns the SparkSQL type of the item for use in a query.
*
* @return String representing the SparkSQL type of the item.
*/
default String getSparkSQLType() {
throw new UnsupportedOperationException("Operation not defined for type " + this.getDynamicType());
}

/**
* Tests for physical equality. The semantics are that of the eq operator.
*
* @param other another item.
* @return true it is equal to other, false otherwise.
*/
default boolean physicalEquals(Object other) {
if (!(other instanceof Item)) {
return false;
}
Item otherItem = (Item) other;
if (this.getTopLevelID() == -1 || otherItem.getTopLevelID() == -1) {
return System.identityHashCode(this) == System.identityHashCode(otherItem);
}
return this.getTopLevelID() == otherItem.getTopLevelID() && this.getPathIn().equals(otherItem.getPathIn());
}

/**
* Tests for logical equality. The semantics are that of the eq operator.
*
Expand Down
12 changes: 0 additions & 12 deletions src/main/java/org/rumbledb/api/Rumble.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import org.rumbledb.context.DynamicContext;
import org.rumbledb.expressions.module.MainModule;
import org.rumbledb.runtime.RuntimeIterator;
import org.rumbledb.runtime.update.PendingUpdateList;
import sparksoniq.spark.SparkSessionManager;

import java.io.IOException;
Expand Down Expand Up @@ -52,11 +51,6 @@ public SequenceOfItems runQuery(String query) {
this.configuration
);

if (iterator.isUpdating()) {
PendingUpdateList pul = iterator.getPendingUpdateList(dynamicContext);
pul.applyUpdates(iterator.getMetadata());
}

return new SequenceOfItems(iterator, dynamicContext, this.configuration);
}

Expand All @@ -78,12 +72,6 @@ public SequenceOfItems runQuery(URI location) throws IOException {
this.configuration
);

if (iterator.isUpdating()) {
PendingUpdateList pul = iterator.getPendingUpdateList(dynamicContext);
pul.applyUpdates(iterator.getMetadata());
}

System.err.println("final iterator is: " + iterator.isUpdating());
return new SequenceOfItems(iterator, dynamicContext, this.configuration);
}

Expand Down
Loading
Loading