Skip to content

Create bq schema using a batch of data #78

Answered by bxparks
tc5613213 asked this question in Q&A
Discussion options

You must be logged in to vote

If you are using newline delimited json, just concat all the files and pipe into generate-schema:

$ cat file1.json file2.json file3.json ... | generate-schema

If you are using CSV files, you have to strip off the 1st line of each file, because that contains the header.

If the files are too big to be concatenated together, use the --existing_schema_path flag, and write a shell script to generate the updated schema, then feed the new schema into the generate-schema command using this flag.

If that is not suitable, write a Python program and use bigquery-schema-generator as a library, and process each file within the Python program.

Lots of options.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@bxparks
Comment options

Answer selected by tc5613213
Comment options

You must be logged in to vote
1 reply
@bxparks
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants