【问题标题】:Google BigQuery API (Python client Library) > Querying data (asynchronous)Google BigQuery API(Python 客户端库)> 查询数据(异步)
【发布时间】:2017-07-31 01:49:39
【问题描述】:

我正在关注Python Client Libraries for the Google BigQuery API - https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery/usage.html#jobs > 查询数据(异步)

当涉及到检索结果时,执行代码:

rows, total_count, token = query.fetch_data()  # API requet

总是返回ValueError: too many values to unpack (expected 3) (顺便说一句,我认为有一个错字,应该是results.fetch_data()!)

但是,下面的代码可以正常工作

results = job.results()
rows = results.fetch_data()
tbl = [x for x in rows]

表的所有行都在 tbl 中单次返回(作为元组列表),>225K 行!

任何人都知道为什么我会收到错误,或者文档中有什么问题吗?

我如何仍然可以批量检索结果(逐页迭代)

提前非常感谢!

【问题讨论】:

    标签: python google-bigquery google-cloud-platform


    【解决方案1】:

    不久前,我打开this issue 要求更新文档,但从答案中可以看出,它仍然需要正式发布才能更改。

    请参考code base 本身以获得更好的文档字符串(在这种情况下特别是类Iterator):

    """Iterators for paging through API responses.
    These iterators simplify the process of paging through API responses
    where the response is a list of results with a ``nextPageToken``.
    To make an iterator work, you'll need to provide a way to convert a JSON
    item returned from the API into the object of your choice (via
    ``item_to_value``). You also may need to specify a custom ``items_key`` so
    that a given response (containing a page of results) can be parsed into an
    iterable page of the actual objects you want. You then can use this to get
    **all** the results from a resource::
        >>> def item_to_value(iterator, item):
        ...     my_item = MyItemClass(iterator.client, other_arg=True)
        ...     my_item._set_properties(item)
        ...     return my_item
        ...
        >>> iterator = Iterator(..., items_key='blocks',
        ...                     item_to_value=item_to_value)
        >>> list(iterator)  # Convert to a list (consumes all values).
    Or you can walk your way through items and call off the search early if
    you find what you're looking for (resulting in possibly fewer
    requests)::
        >>> for my_item in Iterator(...):
        ...     print(my_item.name)
        ...     if not my_item.is_valid:
        ...         break
    At any point, you may check the number of items consumed by referencing the
    ``num_results`` property of the iterator::
        >>> my_iterator = Iterator(...)
        >>> for my_item in my_iterator:
        ...     if my_iterator.num_results >= 10:
        ...         break
    When iterating, not every new item will send a request to the server.
    To iterate based on each page of items (where a page corresponds to
    a request)::
        >>> iterator = Iterator(...)
        >>> for page in iterator.pages:
        ...     print('=' * 20)
        ...     print('    Page number: %d' % (iterator.page_number,))
        ...     print('  Items in page: %d' % (page.num_items,))
        ...     print('     First item: %r' % (next(page),))
        ...     print('Items remaining: %d' % (page.remaining,))
        ...     print('Next page token: %s' % (iterator.next_page_token,))
        ====================
            Page number: 1
          Items in page: 1
             First item: <MyItemClass at 0x7f1d3cccf690>
        Items remaining: 0
        Next page token: eav1OzQB0OM8rLdGXOEsyQWSG
        ====================
            Page number: 2
          Items in page: 19
             First item: <MyItemClass at 0x7f1d3cccffd0>
        Items remaining: 18
        Next page token: None
    To consume an entire page::
        >>> list(page)
        [
            <MyItemClass at 0x7fd64a098ad0>,
            <MyItemClass at 0x7fd64a098ed0>,
            <MyItemClass at 0x7fd64a098e90>,
        ]
    

    【讨论】:

    • 非常感谢伙计!这真的澄清了事情:)
    【解决方案2】:

    是的,您对文档的看法是正确的。有错别字-

    results = job.results()

    rows, total_count, token = query.fetch_data() # API requet

    while True:

        do_something_with(rows)
    
         if token is None:
    
              break
    
        rows, total_count,token=query.fetch_data(page_token=token)       # API requeste here
    

    对于大数据集,我们每小时查询一次以获取日常工作中的数据。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-04-16
      • 1970-01-01
      • 2018-04-15
      • 2019-01-07
      • 2011-11-30
      • 2015-09-24
      • 1970-01-01
      相关资源
      最近更新 更多