-
Hi, I wonder if it is possible for me to use a batch of data, deduce schema base on the batch of data instead of a single file? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
If you are using newline delimited json, just concat all the files and pipe into
If you are using CSV files, you have to strip off the 1st line of each file, because that contains the header. If the files are too big to be concatenated together, use the --existing_schema_path flag, and write a shell script to generate the updated schema, then feed the new schema into the If that is not suitable, write a Python program and use bigquery-schema-generator as a library, and process each file within the Python program. Lots of options. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your quick response. I just figure it out and it works. However, I came into a situation where the there might be a type in-consistance with in the data. Let say I have got a field, 1 json files have the type int, and the other one with the type str. How would you handle this kind of situation? |
Beta Was this translation helpful? Give feedback.
If you are using newline delimited json, just concat all the files and pipe into
generate-schema
:If you are using CSV files, you have to strip off the 1st line of each file, because that contains the header.
If the files are too big to be concatenated together, use the --existing_schema_path flag, and write a shell script to generate the updated schema, then feed the new schema into the
generate-schema
command using this flag.If that is not suitable, write a Python program and use bigquery-schema-generator as a library, and process each file within the Python program.
Lots of options.