【问题标题】:Airflow: Why the Rendered Template and Log query is different?Airflow:为什么 Rendered Template 和 Log 查询不同?
【发布时间】:2020-12-15 19:30:57
【问题描述】:

我正在使用 Airflow DAG (GCP Composer) 运行 SQL 查询。任务运行没有任何错误。但是,它运行以前版本的查询。有趣的是,“任务实例详细信息”和“渲染模板”具有当前和正确的版本,但“日志”文件打印查询的先前版本。

提前感谢您的帮助。

例子:

**Sample Task in the dag:**

table_3 = BigQueryOperator(
        task_id='create_table_3',
        sql='/sql/table_three.sql',
        params=dict(
            project_id=pro_id,
            dataset_id=data_pipe
        ),
        destination_dataset_table=f'{data_pipe}.Table_three',
        use_legacy_sql=False,
        allow_large_results=True,
        write_disposition='WRITE_TRUNCATE'
    )  

呈现的查询的虚拟部分

T3 AS (
    SELECT 
         DateTime
        ,NAME
        ,ID
        ,Input
        ,output
    FROM T2
    WHERE ID IS NOT NULL
)

,T3_intertable as (
SELECT 
     DateTime
    ,NAME
    ,Input AS Input_1
    ,output AS Output_1
FROM T3
order by DateTime
)

日志查询的虚拟部分

[2020-12-15 05:23:40,689] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3 T3 AS (
[2020-12-15 05:23:40,689] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3  SELECT 
[2020-12-15 05:23:40,689] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3       DateTime_UTC
[2020-12-15 05:23:40,690] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3      ,NAME
[2020-12-15 05:23:40,690] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3      ,TRAIN
[2020-12-15 05:23:40,690] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3      ,ID
[2020-12-15 05:23:40,690] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3      ,Input
[2020-12-15 05:23:40,690] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3  FROM T2 
[2020-12-15 05:23:40,690] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3  WHERE ID IS NOT NULL 
[2020-12-15 05:23:40,691] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3 )
[2020-12-15 05:23:40,691] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3 SELECT 
[2020-12-15 05:23:40,691] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3   DateTime_UTC
[2020-12-15 05:23:40,692] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3  ,NAME
[2020-12-15 05:23:40,692] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3  ,TRAIN
[2020-12-15 05:23:40,692] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3  ,Input AS Input_1  
[2020-12-15 05:23:40,692] {base_task_runner.py:101} INFO - Job 2663300: Subtask create_table_3 FROM T3

【问题讨论】:

    标签: airflow google-cloud-composer


    【解决方案1】:

    问题解决了。这可能是气流故障。我基本上用前一个查询替换(不删除)更新的查询。现在,我删除了之前的查询文件,并将新的(更新的)查询上传到 DAG 的 sql 文件夹中。

    这解决了问题。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-04-01
      • 1970-01-01
      • 2022-12-20
      • 1970-01-01
      相关资源
      最近更新 更多