-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEOMESA-3259 FSDS - Add support for GeoParquet #3064
Open
adeet1
wants to merge
5
commits into
locationtech:main
Choose a base branch
from
adeet1:GEOMESA-3259
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
adeet1
commented
Mar 20, 2024
•
edited
Loading
edited
- Create a bounding box for each geometry, and add it to the GeoParquet metadata (which requires the metadata map to be changed to a mutable data structure)
- Read and write all geometry attributes as binary (a primitive Parquet type) instead of as a pair of x/y doubles (a group Parquet type), using the same converter and attribute writer for all geometry types, while also maintaining backwards compatibility
- Add support for parsing WKB bytes in the Parquet geometry transformer functions
- Use a spatial index instead of a GeoTools filter for bounding box queries
To-do items:
|
elahrvivaz
reviewed
Mar 21, 2024
...parquet/src/main/scala/org/locationtech/geomesa/convert/parquet/ParquetFunctionFactory.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...t-parquet/src/test/scala/org/locationtech/geomesa/convert/parquet/ParquetConverterTest.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...on/src/main/scala/org/locationtech/geomesa/fs/storage/common/AbstractFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...src/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureReadSupport.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...src/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureReadSupport.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...rc/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureWriteSupport.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
...rc/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureWriteSupport.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 21, 2024
elahrvivaz
reviewed
Mar 28, 2024
...arquet/src/main/scala/org/locationtech/geomesa/convert/parquet/ParquetConverterFactory.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
elahrvivaz
reviewed
Mar 28, 2024
...age-parquet/src/main/scala/org/locationtech/geomesa/fs/storage/parquet/FilterConverter.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...et/src/main/scala/org/locationtech/geomesa/fs/storage/parquet/ParquetFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...et/src/main/scala/org/locationtech/geomesa/fs/storage/parquet/ParquetFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
.../src/main/scala/org/locationtech/geomesa/fs/storage/parquet/SimpleFeatureParquetWriter.scala
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...rc/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureWriteSupport.scala
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...esa-fs-storage/geomesa-fs-storage-parquet/src/test/resources/geoparquet-metadata-schema.json
Outdated
Show resolved
Hide resolved
elahrvivaz
reviewed
Mar 28, 2024
...on/src/main/scala/org/locationtech/geomesa/fs/storage/common/AbstractFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
adeet1
commented
Mar 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When we compact GeoParquet files in a filesystem partition, we need to ensure that the bounding boxes in the metadata of the files get merged correctly (i.e. assert that the union of bounding boxes of the files before compaction is equal to the union of bounding boxes of the newly compacted files).
commit 0ea8bff Author: adeet1 <[email protected]> Date: Fri Mar 29 20:29:40 2024 +0000 Optimize imports commit 9ebd85a Author: adeet1 <[email protected]> Date: Fri Mar 29 20:12:03 2024 +0000 Initialize bounds as an empty array instead of null * This fixes a failing unit test "suppress or allow empty output files" in ExportCommandTest.scala commit 4cff76a Author: adeet1 <[email protected]> Date: Fri Mar 29 15:18:09 2024 +0000 Split Parquet and Orc file compaction tests in order to differentiate the comparisons commit 16d88fd Author: adeet1 <[email protected]> Date: Wed Mar 27 20:48:07 2024 +0000 Assert in each partition that GeoParquet metadata bounding boxes across files are correctly merged upon compaction * Write features with different geometries and coordinates, so we can test the merging of unique bounding boxes. commit 4197e4d Author: adeet1 <[email protected]> Date: Thu Mar 28 21:27:17 2024 +0000 Change thunk to lazy vals commit 4eaf9fc Author: adeet1 <[email protected]> Date: Thu Mar 28 20:22:10 2024 +0000 Implement methods instead of lazy vals commit c82c0d2 Author: adeet1 <[email protected]> Date: Thu Mar 28 20:13:56 2024 +0000 Move test scope commit 09588e8 Author: adeet1 <[email protected]> Date: Thu Mar 28 20:01:00 2024 +0000 Don't create a GeoParquet metadata string if the SFT has no geometries commit 137dcb5 Author: adeet1 <[email protected]> Date: Thu Mar 28 19:36:31 2024 +0000 Re-implement GeoParquet metadata logic to work for SFTs with multiple geometries commit 360c2c7 Author: adeet1 <[email protected]> Date: Thu Mar 28 16:58:26 2024 +0000 Change back to GroupReadSupport * This simply checks if the Parquet file is valid - it won't deserialize/manifest everything and thus saves us some processing commit 3bce59e Author: adeet1 <[email protected]> Date: Thu Mar 28 14:39:34 2024 +0000 Use the released GeoParquet metadata schema, not the dev one commit 878abb5 Author: adeet1 <[email protected]> Date: Thu Mar 28 14:30:35 2024 +0000 Optimize imports commit d49fc3a Author: adeet1 <[email protected]> Date: Wed Mar 27 14:47:54 2024 +0000 Assert that the bounding box in the GeoParquet metadata is correct commit 2ae9574 Author: adeet1 <[email protected]> Date: Tue Mar 26 23:14:46 2024 +0000 Instantiate the observer directly in SimpleFeatureWriteSupport instead of passing it down from SimpleFeatureParquetWriter commit 9770a3a Author: adeet1 <[email protected]> Date: Fri Mar 22 14:09:05 2024 +0000 Tweak targetSize commit 604e614 Author: adeet1 <[email protected]> Date: Wed Mar 20 19:55:59 2024 +0000 Assert that the file metadata adheres to the GeoParquet metadata json schema commit 2257d6c Author: adeet1 <[email protected]> Date: Thu Mar 21 22:03:29 2024 +0000 Deprecate the ParquetFunctionFactory class, but provide backwards compatibility commit 03e699f Author: adeet1 <[email protected]> Date: Thu Mar 21 20:04:43 2024 +0000 Create a new metadata map instance when adding bounding box commit 8630eed Author: adeet1 <[email protected]> Date: Thu Mar 21 18:07:30 2024 +0000 Change BoundsObserver argument back to FileSystemObserver commit 921274b Author: adeet1 <[email protected]> Date: Thu Mar 21 17:53:38 2024 +0000 If the sft has no geometry field, then omit the GeoParquet metadata entirely commit c1dda99 Author: adeet1 <[email protected]> Date: Thu Mar 21 17:51:26 2024 +0000 Omit orientation, edges and epoch commit dabdc43 Author: adeet1 <[email protected]> Date: Thu Mar 21 17:39:47 2024 +0000 Make variables private to avoid exposing mutable state outside the scope of the class commit 5eecf48 Author: adeet1 <[email protected]> Date: Thu Mar 21 17:32:01 2024 +0000 Delete redundant checks in geometry read and write support commit 0ed5c65 Author: adeet1 <[email protected]> Date: Thu Mar 21 14:55:29 2024 +0000 Delete duplicate dependency commit 3dc798d Author: adeet1 <[email protected]> Date: Wed Mar 20 19:09:44 2024 +0000 Support backwards compatibility for FilterConverter commit 7dea125 Author: adeet1 <[email protected]> Date: Wed Mar 20 15:32:31 2024 +0000 Delete .parquet.crc file after running tests commit 652bf3a Author: Adeet Patel <[email protected]> Date: Mon Feb 12 12:16:35 2024 -0500 GEOMESA-3259 FSDS - Add support for GeoParquet * Create a BoundsObserver trait, and tweak various classes and methods to use that trait * Add an observer to the SimpleFeatureParquetWriter and write records to it, in order to create a bounding box of all the geometries. Add this bounding box to the GeoParquet metadata (which requires the metadata map to be changed to a mutable data structure). * Read/write all geometry attributes in binary (a primitive Parquet type) instead of as a pair of x/y doubles (a group Parquet type), using the same converter and attribute writer for all geometry types, while also maintaining backwards compatibility * Add support for parsing WKB bytes in the Parquet geometry transformer functions * Exclude bounding box from the GeoTools filter and use a spatial index instead Co-authored-by: Emilio Lahr-Vivaz <[email protected]>
elahrvivaz
reviewed
Jun 17, 2024
...on/src/main/scala/org/locationtech/geomesa/fs/storage/common/AbstractFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.