-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter CSV Reader #385
Comments
2 & 3 are done, feel free to try in the latest dev build. 1 is a bit trickier... |
@TimPossiblee actually, can you try source-option below? |
source: LOCAL
target: SNOWFLAKE
env:
SLING_STREAM_URL_COLUMN: true
SAMPLE_SIZE: 0
defaults:
mode: full-refresh
single: true
source_options:
format: csv
header: true
delimiter: ;
fields_per_rec: 0
target_options:
column_casing: target
streams:
"file://./test.csv":
object: 'PUBLIC.CSV_TEST' Log Output
I updated sling to the latest version and adjusted the yaml config. Point 1: Point 2: Secondly I could not see a difference to before. The first row is trimmed while the other two rows keep their whitespace (see col2). sling-cli/core/dbio/iop/datastream.go Line 699 in 0fdb025
Using |
v1.3.2 almost works
Example for
|
Feature Description
I would like to request the following improvements regarding CSV loading.
Fail on different number of columns per file
When for example the delimiter is part of the data, different rows will have different amount of columns.
I would like to have an option to Error when not every row has the same number of columns, instead of omitting or creating dummy columns.
Different number of columns accross files should still be allowed.
The expected number of columns should be taken from the csv header or first row of data.
Preserve raw values when column is defined as string
I would expect sling to preserve the original value of a column when it is defined as string e.g. columns={"*": "string"}.
But even with SAMPLE_SIZE=0 the analyzer still seems to be running on the first row of data.
Disabling data sampling when set to 0 would also be an option.
See how in the example in the first column in the first row is missing the whitespace.
SourceOptions.TrimSpace seems to be a no-op
If I want to trim all whitespaces above the SAMPLE_SIZE value trim_space=true seems to have no effect. Specifying it as a transformation works, is this the intended way?
linux
The text was updated successfully, but these errors were encountered: