Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use git remote hook with url.scheme in ("git+ssh", "ssh") #121

Open
jrmidkiff opened this issue Jun 14, 2023 · 0 comments
Open

Comments

@jrmidkiff
Copy link

Hello, this relates to the discussion I created

I am unable to use DbtRunOperator with a private ssh git repo, and while I am unsure if my syntax is correct, I am encountering an error that leads me to believe that it is not my usage of the operator.

We are running

dbt_run = DbtRunOperator(
    dbt_conn_id="dbt-projects-github", # Airflow connection to private dbt-airflow github repository
    task_id="dbt_run",
    project_dir="git+ssh://github.com/OrganizationName/dbt-airflow",
    # project_conn_id=db_conn, 
    select=["+tag:daily"],
    exclude=["tag:deprecated"],
    target="db_conn", # Airflow Connection to data warehouse
    # profile="my-project",
)

which results in the following error:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 325, in dbt_directory
    store_profiles_dir,
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 369, in prepare_directory
    tmp_dir,
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 182, in download_dbt_project
    return remote.download_dbt_project(project_dir, destination)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/remote.py", line 73, in download_dbt_project
    self.download(source_url, destination_url)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/git.py", line 154, in download
    client, path = self.get_git_client_path(source)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/git.py", line 187, in get_git_client_path
    path = f"{url.netloc.split(':')[1]}/{str(url.path)}"
IndexError: list index out of range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/operators/dbt.py", line 173, in execute
    **vars(self),
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 234, in run_dbt_task
    env_vars=env_vars,
  File "/usr/local/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 330, in dbt_directory
    ) from e
airflow.exceptions.AirflowException: Failed to prepare temporary directory for dbt execution

The url.netloc is github.com, and notably if we passed a github repo url to project_dir that used either git or http/https, then the following code would have run path = str(url.path) rather than path = f"{url.netloc.split(':')[1]}/{str(url.path)}" which appears to be the cause of the error.

Are you able to provide any assistance with this? Also, it would be great while we're struggling through these errors to also receive some feedback on the discussion I opened about this topic as well

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant