You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
currently, when reading parquet file, the fields of file schema is modified that all field names are turned to lowercase.
Solution 1
parquet/ndjson add format option case_sensitive
cons:
can not copy file ('a', 'b') into ('a', 'B')
select, infer_infer_schema not show the orignal field names
3. need to create a file_format for this purpose
Solution 2:
select, infer_infer_schema: add table function option case_sensitive, true by default
add copy option case_sensitive, false by default (for compatible)
table fields check: allow ('a', 'B'), not allow ('a', 'A')
parquet
arrow_to_table_schema, to_lowercase and check for dup name
impl 1 (trans to select): required fields to_lowercase
impl 2 (match): table fields to_lower_case
ndjson:
both to_lower_case when matching names
cons:
the default behavior of select and copy are not consist
pros
select, infer_infer_schema show what the original name by default (maybe we can sacrifice this and let all case_sensitive=false by default for consist)
The text was updated successfully, but these errors were encountered:
COLUMN_MATCH_MODE:
CASE_SENSITIVE: Match columns by name, case-sensitive.
CASE_INSENSITIVE: Match columns by name, case-insensitive.
POSITION: Match columns by position instead of name.
FORMAT_DEFAULT: Use the default matching behavior based on file format.
FILE_FORMAT:
CSV: Default POSITION.
Parquet/ORC/NDJson: Default CASE_INSENSITIVE.
note nota all mode for all format are supported, we will do them one by one
Summary
currently, when reading parquet file, the fields of file schema is modified that all field names are turned to lowercase.
Solution 1
parquet/ndjson add format option case_sensitive
cons:
3. need to create a file_format for this purpose
Solution 2:
case_sensitive
, false by default (for compatible)cons:
pros
The text was updated successfully, but these errors were encountered: