Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the --defer flag for models does not consistently namespace the ml model location #60

Open
switzer opened this issue Sep 8, 2024 · 0 comments

Comments

@switzer
Copy link

switzer commented Sep 8, 2024

Our development process uses the --defer flag so that we can reference production tables if they are not in the developer namespace in BQ. A sample command to build a model is as follows:

dbt run --select my_ml_model --defer --state prod-manifest --target dev --vars "{start_date: '2024-08-30', end_date: '2024-08-30'}"
10:18:28  Running with dbt=1.8.3
10:18:28  Registered adapter: bigquery=1.8.2
10:18:28  Unable to do partial parsing because saved manifest not found. Starting full parse.
10:18:35  [WARNING]: Found patch for macro "test_accepted_values_extended" which was not found
10:18:38  Found xxx models, x snapshots, xx analyses, xxx data tests, xx seeds, xx operations, xx sources, xx exposures, xxxx macros
10:18:38  
10:18:39  
10:18:39  Running 2 on-run-start hooks
10:18:40  1 of 2 START hook: ozone.on-run-start.0 ........................................ [RUN]
10:18:40  1 of 2 OK hook: ozone.on-run-start.0 ........................................... [OK in 0.00s]
10:18:40  2 of 2 START hook: ozone.on-run-start.1 ........................................ [RUN]
10:18:40  2 of 2 OK hook: ozone.on-run-start.1 ........................................... [OK in 0.00s]
10:18:40  
10:18:40  Concurrency: 3 threads (target='dev')
10:18:40  
10:18:40  1 of 1 START sql model model dbt_dev_ml.my_ml_model ......... [RUN]
10:24:41  1 of 1 OK created sql model model dbt_dev_ml.my_ml_model .... [None (2.2 GiB processed) in 30.37s]
10:24:41  
10:24:41  Finished running 1 model model, 2 project hooks in 0 hours 2 minutes and 2.87 seconds (36.87s).

Note that this model was built in the dbt_dev_ml space, which is (I think) correct.

When referencing the model, e.g. building a downstream dbt model which uses dbt_ml.predict(ref('my_ml_model'), 'source_data'), I get the following error:

 Runtime Error in model my_model_prediction (models/transformations/my_model_prediction.sql)
  404 Not found: Dataset analytics:prod_ml was not found in location EU; reason: notFound, message: Not found: Dataset analytics:prod_ml was not found in location EU

It seems like --defer does not check the dev namespace to see if a model is there first, before checking the production namespace.

Interestingly, if I compile both models (e.g. the ML model and the downstream prediction model) in the same dbt run command, then it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant