【问题标题】:Python Google BigQuery Paramaterized SELECTPython Google BigQuery 参数化 SELECT
【发布时间】:2018-08-30 19:29:17
【问题描述】:

我遇到了 BigQuery 参数化问题。我将开始日期和结束日期以及数据库中存在的一组潜在字段传递给函数。开始和结束日期格式为“yyyymmdd”。

目标是能够传递一组日期和一组字段,并收集与两个日期之间的字段数组相关的数据。

日期操作按预期工作。

字段数组的传递方式如下:["user_pseudo_id", "event_name", "event_timestamp"] 作为示例(数组中的其他条目可能)

实际上,我想进一步参数化查询,使其看起来类似于下面,其中@search_params 替换查询的 SELECT 部分中的各个变量。目的是让 fields 数组更具可扩展性,从单个条目到多个条目。

从我的搜索来看,我相信 ArrayQueryParameter(代替 ScalarQueryParameter)可以解决这个问题,但我没有找到太多的使用文档。

query_job = client.query("""
    SELECT @search_params, _TABLE_SUFFIX AS suffix
    FROM `analytics_180354243.events_*` 
    WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
    BETWEEN @start_date AND @end_date
    """, job_config=job_config)

下面的完整功能

def query_awe(start_date, end_date, fields):
    credentials = service_account.Credentials.from_service_account_file('auth.json')

    project_id = 'my-project-id'

    client = bigquery.Client(credentials=credentials, project=project_id)

    search_params = ""

    for i in fields:
        search_params += i + ", "
    search_params = search_params[:-2]

    query_params = [
        bigquery.ScalarQueryParameter('start_date', 'STRING', start_date),
        bigquery.ScalarQueryParameter('end_date', 'STRING', end_date),
        bigquery.ScalarQueryParameter('search_params', 'STRING', search_params),

    ]
    bigquery.ArrayQueryParameter

    job_config = bigquery.QueryJobConfig()
    job_config.use_legacy_sql = False
    job_config.query_parameters = query_params

    query_job = client.query("""
        SELECT user_pseudo_id, event_name, _TABLE_SUFFIX AS suffix
        FROM `analytics_180354243.events_*` #Each day saved as events_yyyymmdd
        WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
        BETWEEN @start_date AND @end_date
        ORDER BY user_pseudo_id DESC
        """, job_config=job_config)

    results = query_job.result()  # Waits for job to complete.

    for row in results:
        print(row)

【问题讨论】:

    标签: python sql google-bigquery


    【解决方案1】:

    只使用字符串格式呢?

    def query_awe(start_date, end_date, fields):
        credentials = service_account.Credentials.from_service_account_file('auth.json')
    
        project_id = 'my-project-id'
    
        client = bigquery.Client(credentials=credentials, project=project_id)
    
        job_config = bigquery.QueryJobConfig()
        job_config.use_legacy_sql = False
    
        my_query = """
            SELECT {0}, _TABLE_SUFFIX AS suffix
            FROM `analytics_180354243.events_*` #Each day saved as events_yyyymmdd
            WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
            BETWEEN {1} AND {2}
            ORDER BY user_pseudo_id DESC
            """
    
        my_query = my_query.format(', '.join(fields), start_date, end_date)
        query_job = client.query(my_query, job_config=job_config)
        results = query_job.result()  # Waits for job to complete.
    
        for row in results:
            print(row)
    

    【讨论】:

      【解决方案2】:

      我只是使用一个简单的替换功能来完成您的要求。

      myQuery = """SELECT <var_search_params>, _TABLE_SUFFIX AS suffix
      FROM `analytics_180354243.events_*` 
      WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
      BETWEEN @start_date AND @end_date
      """
      myQuery.replace("<var_search_params>", "Foo, Bar")
      
      query_job = client.query(myQuery, job_config=job_config)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2017-03-20
        • 1970-01-01
        • 1970-01-01
        • 2012-03-09
        • 1970-01-01
        • 1970-01-01
        • 2016-08-23
        相关资源
        最近更新 更多