Snowflake Cortex destination : Bug fixes #38206

bindipankhudi · 2024-05-15T03:18:02Z

This PR addresses the following:

UI related fixes

Order params to match snowflake destination
Remove secret field from params (which was hiding what was being typed)
Updated doc link (was linked to Pinecone earlier)

Update to write logic

For destinationMode=Overwrite, we first delete all records for the specified stream and then call cortexProcessor with WriteStrategy.append otherwise the batch size enforced by vector_db_based results in records getting overwritten every time batch size is met. We usually do this for all vector dbs, I missed it earlier.

vercel · 2024-05-15T03:18:05Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		May 15, 2024 1:18pm

bindipankhudi · 2024-05-15T13:19:32Z

...grations/connectors/destination-snowflake-cortex/destination_snowflake_cortex/destination.py

@@ -16,7 +16,7 @@
 from destination_snowflake_cortex.config import ConfigModel
 from destination_snowflake_cortex.indexer import SnowflakeCortexIndexer

-BATCH_SIZE = 32
+BATCH_SIZE = 150


32 seemed to low in general. each batch calls pyairbyte for write once.

bindipankhudi · 2024-05-15T13:20:04Z

...integrations/connectors/destination-snowflake-cortex/destination_snowflake_cortex/indexer.py

@@ -85,7 +86,7 @@ def _get_updated_catalog(self) -> ConfiguredAirbyteCatalog:
 metadata -> metadata of the record
 embedding -> embedding of the document content
 """
- updated_catalog = self.catalog
+ updated_catalog = copy.deepcopy(self.catalog)


needed to not change the original catalog since this method is called twice

bindipankhudi · 2024-05-15T13:20:44Z

...integrations/connectors/destination-snowflake-cortex/destination_snowflake_cortex/indexer.py

 pass

+ def pre_sync(self, catalog: ConfiguredAirbyteCatalog) -> None:


new method - meant to be implemented for vector dbs - deletes records beforehand for overwrite.

bindipankhudi · 2024-05-15T13:21:54Z

...integrations/connectors/destination-snowflake-cortex/destination_snowflake_cortex/indexer.py

@@ -144,7 +145,8 @@ def get_write_strategy(self, stream_name: str) -> WriteStrategy:
 for stream in self.catalog.streams:
 if stream.stream.name == stream_name:
 if stream.destination_sync_mode == DestinationSyncMode.overwrite:
- return WriteStrategy.REPLACE
+ # we will use append here since we will remove the existing records and add new ones.
+ return WriteStrategy.APPEND


for overwrite mode we delete records first and just use append in pyairbyte, since data is sent in batches.

aaronsteers

Approving with one caveat. When in 'replace' mode, we would ideally load to a stage table and then swap the table name with the existing table after the load is complete. The SQLProcessor class should do this automatically when in 'replace' mode, but it may require a refactor to actually implement this with confidence.

So, non-blocking, but something to think about for next iterations.

Otherwise, this all looks great!

bindipankhudi · 2024-05-15T16:48:52Z

Approving with one caveat. When in 'replace' mode, we would ideally load to a stage table and then swap the table name with the existing table after the load is complete. The SQLProcessor class should do this automatically when in 'replace' mode, but it may require a refactor to actually implement this with confidence.

So, non-blocking, but something to think about for next iterations.

Yea, that makes sense. Created this issue: https://github.com/airbytehq/airbyte-internal-issues/issues/7928

minor fixes

23bbeef

bindipankhudi requested a review from a team as a code owner May 15, 2024 03:18

octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/snowflake-cortex labels May 15, 2024

vercel bot deployed to Preview May 15, 2024 03:19 View deployment

bindipankhudi added 2 commits May 14, 2024 20:36

chore:format fix

f83e07c

updated version and batch size

d22d471

bindipankhudi removed the request for review from a team May 15, 2024 05:46

vercel bot deployed to Preview May 15, 2024 05:49 View deployment

chore:format code

fe80b17

bindipankhudi changed the title ~~Snowflake Cortex destination : Minor fixes post release~~ Snowflake Cortex destination : Bug fixes May 15, 2024

add delete for overwrite

af0e8e6

bindipankhudi commented May 15, 2024

View reviewed changes

aaronsteers approved these changes May 15, 2024

View reviewed changes

bindipankhudi merged commit e19e634 into master May 15, 2024
39 checks passed

bindipankhudi deleted the bindi/snowflake-cortex-minor-fixups branch May 15, 2024 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snowflake Cortex destination : Bug fixes #38206

Snowflake Cortex destination : Bug fixes #38206

bindipankhudi commented May 15, 2024 •

edited

vercel bot commented May 15, 2024 •

edited

bindipankhudi May 15, 2024

bindipankhudi May 15, 2024

bindipankhudi May 15, 2024

bindipankhudi May 15, 2024

aaronsteers left a comment •

edited

bindipankhudi commented May 15, 2024

		pass

		def pre_sync(self, catalog: ConfiguredAirbyteCatalog) -> None:

Snowflake Cortex destination : Bug fixes #38206

Snowflake Cortex destination : Bug fixes #38206

Conversation

bindipankhudi commented May 15, 2024 • edited

vercel bot commented May 15, 2024 • edited

bindipankhudi May 15, 2024

Choose a reason for hiding this comment

bindipankhudi May 15, 2024

Choose a reason for hiding this comment

bindipankhudi May 15, 2024

Choose a reason for hiding this comment

bindipankhudi May 15, 2024

Choose a reason for hiding this comment

aaronsteers left a comment • edited

Choose a reason for hiding this comment

bindipankhudi commented May 15, 2024

bindipankhudi commented May 15, 2024 •

edited

vercel bot commented May 15, 2024 •

edited

aaronsteers left a comment •

edited