You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Connect Version tested: v.39.1, 0.41.0
Spark Version 3.5.1
When working with datatime fields, the data needs to be serialized as strings or converted to timestamp due to Spark not having a datetime data type.
If you have the datetime field in your spark dataframe serialized as a string when you load to BQ using the indirect method the load fails. This is not the case with the direct load method that uses the stroage write API.
Sample Data:
schema = StructType([
StructField("a", StringType(), True),
StructField("b", TimestampType(), True)
])
data = []
for _ in range(10):
random_string = ''.join(random.choice('abcdefghijklmnopqrstuvwxyz') for i in range(10))
random_datetime = datetime.now() - timedelta(days=random.randint(0, 365))
data.append((random_string, random_datetime))
df = spark.createDataFrame(data, schema)
My BQ Table:
create or replace table demo_data.datetime_test (
a string,
b datetime
);
Error:
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Provided Schema does not match Table [projectid]:demo_data.datetime_test. Field b has changed type from DATETIME to STRING
I dumped the data I was loading to a parquet file and attempted to load it directly using the bq load tool. I received a similar error there as well...which made me thing that the issue was related to the bq load utility.
Provided Schema does not match Table [projectid]:demo_data.datetime_test. Field b has changed type from DATETIME to STRING
I figure this should be a bug given that it works with one load method but not the other.
Is the solution here to just convert to Timestamp_ntz in the dataframe? This seems to work with both direct and indirect load methods.
The text was updated successfully, but these errors were encountered:
Connect Version tested: v.39.1, 0.41.0
Spark Version 3.5.1
When working with datatime fields, the data needs to be serialized as strings or converted to timestamp due to Spark not having a datetime data type.
If you have the datetime field in your spark dataframe serialized as a string when you load to BQ using the indirect method the load fails. This is not the case with the direct load method that uses the stroage write API.
Sample Data:
My BQ Table:
Error:
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Provided Schema does not match Table [projectid]:demo_data.datetime_test. Field b has changed type from DATETIME to STRING
I dumped the data I was loading to a parquet file and attempted to load it directly using the bq load tool. I received a similar error there as well...which made me thing that the issue was related to the bq load utility.
Provided Schema does not match Table [projectid]:demo_data.datetime_test. Field b has changed type from DATETIME to STRING
I figure this should be a bug given that it works with one load method but not the other.
Is the solution here to just convert to Timestamp_ntz in the dataframe? This seems to work with both direct and indirect load methods.
The text was updated successfully, but these errors were encountered: