【问题标题】:AWS MWAA issues to run DBTAWS MWAA 问题运行 DBT
【发布时间】:2022-10-05 22:40:01
【问题描述】:

我正在关注 AWS here 关于使用 MWAA 运行 DBT 的教程。 我在 s3 (my-bucket/dags/dbt/dbt-starter-project) 中复制了 dbt-starter-project 并将教程中的两个 DAG 添加到 my-bucket/dags 文件夹中。

第一个检查安装是否正确的代码是:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago

with DAG(dag_id=\"dbt-installation-test\", schedule_interval=None, catchup=False, start_date=days_ago(1)) as dag:
    cli_command = BashOperator(
        task_id=\"bash_command\",
        bash_command=\"/usr/local/airflow/.local/bin/dbt --version\"
    )

但是 DAG 失败了

[2022-10-01, 10:10:38 UTC] {{taskinstance.py:1262}} INFO - Executing <Task(BashOperator): bash_command> on 2022-10-01 10:10:37.699795+00:00
[2022-10-01, 10:10:38 UTC] {{standard_task_runner.py:52}} INFO - Started process 515 to run task
[2022-10-01, 10:10:38 UTC] {{standard_task_runner.py:76}} INFO - Running: [\'airflow\', \'tasks\', \'run\', \'dbt-installation-test\', \'bash_command\', \'manual__2022-10-01T10:10:37.699795+00:00\', \'--job-id\', \'20\', \'--raw\', \'--subdir\', \'DAGS_FOLDER/dag_check_dbt.py\', \'--cfg-path\', \'/tmp/tmpw5qjhl4p\', \'--error-file\', \'/tmp/tmpanvrgrxj\']
[2022-10-01, 10:10:38 UTC] {{standard_task_runner.py:77}} INFO - Job 20: Subtask bash_command
[2022-10-01, 10:10:38 UTC] {{logging_mixin.py:109}} INFO - Running <TaskInstance: dbt-installation-test.bash_command manual__2022-10-01T10:10:37.699795+00:00 [running]> on host ip-172-27-4-81.eu-west-1.compute.internal
[2022-10-01, 10:10:38 UTC] {{taskinstance.py:1429}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=dbt-installation-test
AIRFLOW_CTX_TASK_ID=bash_command
AIRFLOW_CTX_EXECUTION_DATE=2022-10-01T10:10:37.699795+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-10-01T10:10:37.699795+00:00
[2022-10-01, 10:10:38 UTC] {{subprocess.py:62}} INFO - Tmp dir root location: 
 /tmp
[2022-10-01, 10:10:38 UTC] {{subprocess.py:74}} INFO - Running command: [\'bash\', \'-c\', \'/usr/local/airflow/.local/bin/dbt --version\']
[2022-10-01, 10:10:38 UTC] {{subprocess.py:85}} INFO - Output:
[2022-10-01, 10:10:38 UTC] {{subprocess.py:89}} INFO - bash: /usr/local/airflow/.local/bin/dbt: No such file or directory
[2022-10-01, 10:10:38 UTC] {{subprocess.py:93}} INFO - Command exited with return code 127
[2022-10-01, 10:10:38 UTC] {{taskinstance.py:1703}} ERROR - Task failed with exception
Traceback (most recent call last):
  File \"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py\", line 1332, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File \"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py\", line 1458, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File \"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py\", line 1514, in _execute_task
    result = execute_callable(context=context)
  File \"/usr/local/lib/python3.7/site-packages/airflow/operators/bash.py\", line 188, in execute
    f\'Bash command failed. The command returned a non-zero exit code {result.exit_code}.\'
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 127.
[2022-10-01, 10:10:38 UTC] {{taskinstance.py:1280}} INFO - Marking task as FAILED. dag_id=dbt-installation-test, task_id=bash_command, execution_date=20221001T101037, start_date=20221001T101038, end_date=20221001T101038
[2022-10-01, 10:10:39 UTC] {{standard_task_runner.py:91}} ERROR - Failed to execute job 20 for task bash_command
Traceback (most recent call last):
  File \"/usr/local/lib/python3.7/site-packages/airflow/task/task_runner/standard_task_runner.py\", line 85, in _start_by_fork
    args.func(args, dag=self.dag)
  File \"/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py\", line 48, in command
    return func(*args, **kwargs)
  File \"/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py\", line 92, in wrapper
    return f(*args, **kwargs)
  File \"/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py\", line 292, in task_run
    _run_task_by_selected_method(args, dag, ti)
  File \"/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py\", line 107, in _run_task_by_selected_method
    _run_raw_task(args, ti)
  File \"/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py\", line 184, in _run_raw_task
    error_file=args.error_file,
  File \"/usr/local/lib/python3.7/site-packages/airflow/utils/session.py\", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File \"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py\", line 1332, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File \"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py\", line 1458, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File \"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py\", line 1514, in _execute_task
    result = execute_callable(context=context)
  File \"/usr/local/lib/python3.7/site-packages/airflow/operators/bash.py\", line 188, in execute
    f\'Bash command failed. The command returned a non-zero exit code {result.exit_code}.\'
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 127.
[2022-10-01, 10:10:39 UTC] {{local_task_job.py:154}} INFO - Task exited with return code 1
[2022-10-01, 10:10:39 UTC] {{local_task_job.py:264}} INFO - 0 downstream tasks scheduled from follow-on schedule check

这是否意味着安装不成功?如果此测试失败,AWS 没有提供任何故障排除。

requirements.txt 的内容:

--constraint \"https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.7.txt\"
apache-airflow[postgres,mysql,google]==2.2.2
SQLAlchemy==1.3.24
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
sentry-sdk==1.4.3
google-ads
tableauserverclient
facebook-business
openpyxl
airflow-provider-great-expectations==0.1.1
XlsxWriter

json-rpc==1.13.0
minimal-snowplow-tracker==0.0.2
packaging==20.9
networkx==2.6.3 
mashumaro==2.5
sqlparse==0.4.2

logbook==1.5.3
agate==1.6.1
dbt-extractor==0.4.0

pyparsing==2.4.7 
msgpack==1.0.2
parsedatetime==2.6
pytimeparse==1.1.8
leather==0.3.4
pyyaml==5.4.1

# Airflow constraints are jsonschema==3.2.0
jsonschema==3.1.1
hologram==0.0.14
dbt-core==0.21.1

psycopg2-binary==2.8.6
dbt-postgres==0.21.1
dbt-redshift==0.21.1

    标签: amazon-web-services airflow dbt mwaa


    【解决方案1】:

    您的安装路径似乎略有不同。您正在尝试访问/usr/local/airflow/.local/bin/dbt 中的dbt,这意味着Python 环境将是/usr/local/airflow/.local

    然而,您的气流从 /usr/local/lib/python3.7/site-packages/airflow 运行,这意味着 Python 环境将安装在 /usr/local/ 中。

    尝试将 bash 命令更改为:

    /usr/local/bin/dbt --version
    

    【讨论】:

    • 您好 Jorrick,不幸的是,这并没有帮助 - 在日志下方:[2022-10-01, 19:00:39 UTC] {{subprocess.py:74}} 信息 - 运行命令:['bash', '-c' , '/usr/local/bin/dbt --version'] [2022-10-01, 19:00:39 UTC] {{subprocess.py:85}} 信息 - 输出:[2022-10-01, 19 :00:39 UTC] {{subprocess.py:89}} INFO - bash: /usr/local/bin/dbt: 没有这样的文件或目录 [2022-10-01, 19:00:39 UTC] {{subprocess .py:93}} 信息 - 命令退出并返回代码 127 [2022-10-01, 19:00:39 UTC] {{taskinstance.py:1703}} 错误 - 任务失败并出现异常
    • 嗯,你确定dbt 实际安装了吗?我感觉您的 requirements.txt 可能已更新,但可能未安装。
    • 它没有被安装@Jorrick Sleijster - 看起来提供的 requirements.txt 文件没有做它应该做的事情。例如:ERROR:找不到满足json-rpc==1.13.0要求的版本(来自版本:无)
    • 另一方面,我看到:警告:脚本 dbt 安装在 '/usr/local/airflow/.local/bin' 中,它不在 PATH 上。
    • 我不能——这是不可能的,因为它是有管理的。但是问题已经解决了——确实发生了库冲突。 AWS 的示例运行良好 - 感谢您的帮助。
    【解决方案2】:

    你能解释一下你是如何解决的吗?哪些图书馆有问题(碰撞)。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2022-10-18
      • 2022-07-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多