【问题标题】:Dask agg functions pickle errorDask agg函数泡菜错误
【发布时间】:2017-11-10 09:29:35
【问题描述】:

我有以下 dask 数据框

@timestamp                        datetime64[ns]
@version                                  object
dst                                       object
dst_port                                  object
host                                      object
http_req_header_contentlength             object
http_req_header_host                      object
http_req_header_referer                   object
http_req_header_useragent                 object
http_req_method                           object
http_req_secondleveldomain                object
http_req_url                              object
http_req_version                          object
http_resp_code                            object
http_resp_header_contentlength            object
http_resp_header_contenttype              object
http_user                                 object
local_time                                object
path                                      object
src                                       object
src_port                                  object
tags                                      object
type                                       int64
dtype: object

我正在尝试按操作分组

grouped_by_df = df.groupby(['http_user', 'src'])['@timestamp'].agg(['min', 'max']).reset_index()

运行 grouped_by_df.count().compute()` 时出现以下错误:

Traceback (most recent call last):
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-62-9acb48b4ac67>", line 1, in <module>
    user_host_map.count().compute()
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/dask/base.py", line 98, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/dask/base.py", line 205, in compute
    results = get(dsk, keys, **kwargs)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 1893, in get
    results = self.gather(packed)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 1355, in gather
direct=direct, local_worker=local_worker)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 531, in sync
    return sync(self.loop, func, *args, **kwargs)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/utils.py", line 234, in sync
    six.reraise(*error[0])
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/utils.py", line 223, in f
    result[0] = yield make_coro()
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 1235, in _gather
traceback)
  File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
TypeError: itemgetter expected 1 arguments, got 0

我正在使用 dask 版本 0.15.1 和 LocalCLuster 客户端。什么可能导致问题?

【问题讨论】:

  • 几个问题:1.“http_user”和“src”是字符串还是复杂对象? 2. 您是否尝试从“@timestamp”中删除“@”?

标签: python pickle dask


【解决方案1】:

我们刚刚遇到了类似的错误,我们正在运行某种形式的东西:

df[['col1','col2']].groupby('col1').agg("count")

最后得到类似的错误:

    return pickle.loads(x)
TypeError: itemgetter expected 1 arguments, got 0

但是当我们将 groupby 重新格式化为以下形式时:

df.groupby('col1')['col2'].count()

我们不再收到该错误。我们现在已经重复了几次,而且似乎不仅仅是侥幸。完全不知道为什么会发生这种情况,但如果有人遇到同样的问题,值得一试。

【讨论】:

    猜你喜欢
    • 2018-09-09
    • 1970-01-01
    • 1970-01-01
    • 2016-01-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多