【问题标题】:Is it possible to invoke BigQuery procedures in python client?是否可以在 python 客户端中调用 BigQuery 程序?
【发布时间】:2020-02-18 00:07:41
【问题描述】:

BigQuery 的脚本/程序刚刚推出测试版 - 是否可以使用 BigQuery python 客户端调用程序?

我试过了:

query = """CALL `myproject.dataset.procedure`()...."""
job = client.query(query, location="US",)
print(job.results())
print(job.ddl_operation_performed)

print(job._properties) but that didn't give me the result set from the procedure. Is it possible to get the results?

谢谢!

已编辑 - 我正在调用的存储过程

CREATE OR REPLACE PROCEDURE `Project.Dataset.Table`(IN country STRING, IN accessDate DATE, IN accessId, OUT saleExists INT64)
BEGIN
  IF EXISTS (SELECT 1 FROM dataset.table where purchaseCountry = country and purchaseDate=accessDate and customerId = accessId)
  THEN
  SET saleExists = (SELECT 1);
ELSE
  INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
  SET saleExists = (SELECT 0);
END IF;
END;

【问题讨论】:

  • 您要捕获哪个语句的输出? SELECT 1 FROM dataset.table where purchaseCountry = country and purchaseDate=accessDate and customerId = accessId ?为什么当前程序对您不起作用?
  • 或者只是 saleExists 1/0,这与 SELECT 1 FROM dataset.table 本质上是一样的
  • 更新了我的答案,这也简化了您的程序主体。

标签: google-bigquery google-api-python-client


【解决方案1】:

如果你的过程中有 SELECT ,它就可以工作,因为过程是:

create or replace procedure dataset.proc_output() BEGIN
  SELECT t FROM UNNEST(['1','2','3']) t;
END;

代码:

from google.cloud import bigquery
client = bigquery.Client()
query = """CALL dataset.proc_output()"""
job = client.query(query, location="US")
for result in job.result():
        print result

将输出:

Row((u'1',), {u't': 0})
Row((u'2',), {u't': 0})
Row((u'3',), {u't': 0})

但是,如果一个过程中有多个SELECT,这种方式只能获取最后一个结果集。

更新

见下例:

CREATE OR REPLACE PROCEDURE zyun.exists(IN country STRING, IN accessDate DATE, OUT saleExists INT64)
BEGIN
  SET saleExists = (WITH data AS (SELECT "US" purchaseCountry, DATE "2019-1-1" purchaseDate)
    SELECT Count(*) FROM data where purchaseCountry = country and purchaseDate=accessDate);
  IF saleExists = 0  THEN
    INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
  END IF;
END;
BEGIN
  DECLARE saleExists INT64;
  CALL zyun.exists("US", DATE "2019-2-1", saleExists);
  SELECT saleExists;
END

顺便说一句,您的示例最好使用单个 MERGE statement 而不是脚本。

【讨论】:

  • 我的过程中有一个选择,但也有一个插入语句。那不行吗?
  • 我什至无法得到任何结果。有一个 IF THEN 语句,它查看 select 语句的结果,如果 select 的结果为 false,则插入,然后返回 select
  • @WIT,如果你期望的输出可以表示为数组或struct数组,建议使用OUT参数使输出成为程序接口的一部分。
  • 输出不是问题,问题在于脚本返回 2 或 3 个结果集,我无法从 bigquery 客户端访问结果
  • 程序正在运行,但我无法访问客户端的输出
【解决方案2】:

如果在CALL 命令后面加上SELECT 语句,则可以得到函数的返回值作为结果集。例如,我创建了以下存储过程:

BEGIN
  -- Build an array of the top 100 names from the year 2017.
DECLARE
  top_names ARRAY<STRING>;
SET
  top_names = (
  SELECT
    ARRAY_AGG(name
    ORDER BY
      number DESC
    LIMIT
      100)
  FROM
    `bigquery-public-data.usa_names.usa_1910_current`
  WHERE
    year = 2017 );
  -- Which names appear as words in Shakespeare's plays?
SET
  top_shakespeare_names = (
  SELECT
    ARRAY_AGG(name)
  FROM
    UNNEST(top_names) AS name
  WHERE
    name IN (
    SELECT
      word
    FROM
      `bigquery-public-data.samples.shakespeare` ));
END

运行以下查询会将过程的返回值作为顶级结果集返回。

DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `my-project.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;

在 Python 中:

from google.cloud import bigquery

client = bigquery.Client()
query_string = """
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `swast-scratch.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
"""
query_job = client.query(query_string)
rows = list(query_job.result())
print(rows)

相关:如果存储过程中有 SELECT 语句,则可以遍历作业以获取结果,即使 SELECT 语句不是过程中的最后一条语句。

# TODO(developer): Import the client library.
# from google.cloud import bigquery

# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()

# Run a SQL script.
sql_script = """
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;

-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE year = 2000
);

-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data.samples.shakespeare`
);
"""
parent_job = client.query(sql_script)

# Wait for the whole script to finish.
rows_iterable = parent_job.result()
print("Script created {} child jobs.".format(parent_job.num_child_jobs))

# Fetch result rows for the final sub-job in the script.
rows = list(rows_iterable)
print("{} of the top 100 names from year 2000 also appear in Shakespeare's works.".format(len(rows)))

# Fetch jobs created by the SQL script.
child_jobs_iterable = client.list_jobs(parent_job=parent_job)
for child_job in child_jobs_iterable:
    child_rows = list(child_job.result())
    print("Child job with ID {} produced {} rows.".format(child_job.job_id, len(child_rows)))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-11-08
    • 2021-11-18
    • 2021-11-16
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多