Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery.tables.create Permission required on appending data to a BigQuery Table #1277

Open
smic-datalabs-jdcastro opened this issue Aug 6, 2024 · 4 comments
Assignees

Comments

@smic-datalabs-jdcastro
Copy link

smic-datalabs-jdcastro commented Aug 6, 2024

I am trying to restrict the permissions of a service account to only be able to execute DML statements (e.g. Insert, Update and Delete queries) to a BigQuery Table.

I have created a custom IAM Role derived from BigQuery Data Editor predefined role and essentially removed other unnecessary permissions including the bigquery.tables.create permission.

I have assigend this custom role to the Service Account, but upon execution it outputs an error: "Permission bigquery.tables.create denied on dataset..."

Here is the code snippet on how I append data to the table:

save_df_stream = ( df_stream.writeStream
    .outputMode("append")
    .format("bigquery")
    .options(**options_config)
    .trigger(availableNow = True)
    .start()  
)

Does outputMode("append") really create a table before it loads the data into the table?

@isha97
Copy link
Member

isha97 commented Aug 12, 2024

Hi @smic-datalabs-jdcastro ,

Can you please share the options_config that you are using?

@smic-datalabs-jdcastro
Copy link
Author

smic-datalabs-jdcastro commented Aug 13, 2024

Hi @smic-datalabs-jdcastro ,

Can you please share the options_config that you are using?

Hi @isha97,

Just a bunch of custom fields:

{
  "partitionType": ...,
  "partitionField": ...,
  "temporaryGcsBucket": ...,
  "project": ...,
  "dataset": ...,
  "table": ...,
  "checkpointLocation": ...,
  "allowFieldAddition": True
}

@vishalkarve15
Copy link
Contributor

@smic-datalabs-jdcastro Can you please share how df_stream is created? It might give some insight into debugging this issue.

@smic-datalabs-jdcastro
Copy link
Author

smic-datalabs-jdcastro commented Oct 30, 2024

Hi @vishalkarve15 , kindly see code snippet below for your reference:

stream_config = {
    "cloudFiles.format": file_format,
    "cloudFiles.validateOptions": "true",
    "cloudFiles.inferColumnTypes": "false",
    "cloudFiles.schemaEvolutionMode": "rescue",
    "cloudFiles.schemaLocation": "<path_to_schema>",
    "ignoreMissingFiles": "true",
    "ignoreLeadingWhiteSpace": "false",
    "ignoreTrailingWhiteSpace": "false",
    "readerCaseSensitive": "false"
}

df_stream = (
        spark.readStream
             .format("cloudFiles")
             .options(**stream_config)
             .load("/mnt/gcs_bucket/path/to/object")
             .withColumn("data_ingest_timestamp", lit(ingestion_time).cast("timestamp"))
             .withColumn("raw_file_path", "<path_to_filename>")
    )

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants