【问题标题】:How to run a BigQuery query in Python如何在 Python 中运行 BigQuery 查询
【发布时间】:2017-12-13 16:52:36
【问题描述】:

这是我一直在 BigQuery 中运行的查询,我想在我的 python 脚本中运行。我将如何更改它/我必须添加什么才能让它在 Python 中运行。

#standardSQL
SELECT
  Serial,
  MAX(createdAt) AS Latest_Use,
  SUM(ConnectionTime/3600) as Total_Hours,
  COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;

根据我的研究,我无法使用 Python 将此查询保存为永久表。真的吗?如果是真的,是否仍然可以导出临时表?

【问题讨论】:

    标签: python google-bigquery


    【解决方案1】:

    您需要使用BigQuery Python client lib,然后这样的东西应该可以让您启动并运行:

    from google.cloud import bigquery
    client = bigquery.Client(project='PROJECT_ID')
    query = "SELECT...."
    dataset = client.dataset('dataset')
    table = dataset.table(name='table')
    job = client.run_async_query('my-job', query)
    job.destination = table
    job.write_disposition= 'WRITE_TRUNCATE'
    job.begin()
    

    https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html

    查看当前的BigQuery Python client tutorial

    【讨论】:

    • 由于这是公认的答案,我将在此处添加。您需要将 job_config 设置 use_legacy_sql 指定为 False 以运行 OP 的查询。默认为True。例如python job_config = bigquery.QueryJobConfig() job_config.use_legacy_sql = False client.query(query, job_config=job_config)
    【解决方案2】:

    这是一个很好的使用指南: https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html

    简单地运行和编写查询:

    # from google.cloud import bigquery
    # client = bigquery.Client()
    # dataset_id = 'your_dataset_id'
    
    job_config = bigquery.QueryJobConfig()
    # Set the destination table
    table_ref = client.dataset(dataset_id).table("your_table_id")
    job_config.destination = table_ref
    sql = """
        SELECT corpus
        FROM `bigquery-public-data.samples.shakespeare`
        GROUP BY corpus;
    """
    
    # Start the query, passing in the extra configuration.
    query_job = client.query(
        sql,
        # Location must match that of the dataset(s) referenced in the query
        # and of the destination table.
        location="US",
        job_config=job_config,
    )  # API request - starts the query
    
    query_job.result()  # Waits for the query to finish
    print("Query results loaded to table {}".format(table_ref.path))
    

    【讨论】:

      【解决方案3】:

      我个人更喜欢使用 pandas 进行查询:

      # BQ authentication
      import pydata_google_auth
      SCOPES = [
          'https://www.googleapis.com/auth/cloud-platform',
          'https://www.googleapis.com/auth/drive',
      ]
      
      credentials = pydata_google_auth.get_user_credentials(
          SCOPES,
          # Set auth_local_webserver to True to have a slightly more convienient
          # authorization flow. Note, this doesn't work if you're running from a
          # notebook on a remote sever, such as over SSH or with Google Colab.
          auth_local_webserver=True,
      )
      
      query = "SELECT * FROM my_table"
      
      data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')
      

      【讨论】:

        【解决方案4】:

        pythonbq 包使用非常简单,是一个很好的起点。它使用 python-gbq。

        要开始使用,您需要为外部应用程序访问生成一个 BQ json 密钥。您可以生成您的密钥here

        您的代码如下所示:

        from pythonbq import pythonbq
        
        myProject=pythonbq(
          bq_key_path='path/to/bq/key.json',
          project_id='myGoogleProjectID'
        )
        SQL_CODE="""
        SELECT
          Serial,
          MAX(createdAt) AS Latest_Use,
          SUM(ConnectionTime/3600) as Total_Hours,
          COUNT(DISTINCT DeviceID) AS Devices_Connected
        FROM `dataworks-356fa.FirebaseArchive.testf`
        WHERE Model = "BlueBox-pH"
        GROUP BY Serial
        ORDER BY Serial
        LIMIT 1000;
        """
        output=myProject.query(sql=SQL_CODE)
        
        

        【讨论】:

          【解决方案5】:

          下面是另一种为服务帐户使用 JSON 文件的方法:

          >>> from google.cloud import bigquery
          >>>
          >>> CREDS = 'test_service_account.json'
          >>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
          >>> job = client.query('select * from dataset1.mytable')
          >>> for row in job.result():
          ...     print(r)
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2021-10-03
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2016-06-03
            • 2019-04-22
            • 2012-11-24
            相关资源
            最近更新 更多