【问题标题】:NaN values not replaced into dask dataframeNaN 值未替换为 dask 数据帧
【发布时间】:2021-11-26 20:26:54
【问题描述】:

我正在尝试使用以下代码将 dask 数据帧转换为 pandas 数据帧:

import dask.dataframe as dd
uri = "mysql+pymysql://myUser:myPassword@myHost:myPort/myDatabase"
dataframe = dd.read_sql_table("myTable", uri, "id", columns=["id", "name", "type_id"])
df = dataframe.fillna(0)
print(len(df.index))

但是我遇到了以下错误:

Traceback (most recent call last):
  File "tmp.py", line 5, in <module>
    print(len(df.index))
  File "/home/user/.local/lib/python3.7/site-packages/dask/dataframe/core.py", line 593, in __len__
    len, np.sum, token="len", meta=int, split_every=False
  File "/home/user/.local/lib/python3.7/site-packages/dask/base.py", line 288, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/dask/base.py", line 570, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/dask/threaded.py", line 87, in get
    **kwargs
  File "/home/user/.local/lib/python3.7/site-packages/dask/local.py", line 517, in get_async
    raise_exception(exc, tb)
  File "/home/user/.local/lib/python3.7/site-packages/dask/local.py", line 325, in reraise
    raise exc
  File "/home/user/.local/lib/python3.7/site-packages/dask/local.py", line 223, in execute_task
    result = _execute_task(task, data)
  File "/home/user/.local/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/user/.local/lib/python3.7/site-packages/dask/utils.py", line 35, in apply
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/dask/dataframe/io/sql.py", line 232, in _read_sql_chunk
    return df.astype(meta.dtypes.to_dict(), copy=False)
  File "/home/user/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5683, in astype
    col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
  File "/home/user/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5698, in astype
    new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
  File "/home/user/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 582, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/home/user/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 442, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 625, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "/home/user/.local/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 868, in astype_nansafe
    raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer

我正在使用的表具有以下结构(仅使用 pandas 检索):

id    name      type_id
-------------------------
2     name_2    3.0
3     name_3    3.0
4     name_4    1.0
6     name_6    NaN
7     name_7    2.0
...

我尝试了相同的代码,但没有检索“type_id”列,它按预期工作。

我不明白为什么 NaN 值没有被“0”替换,因为我在尝试转换数据帧之前使用了 fillna(0) 函数。

如果我用 phpmyadmin 查看我的数据库,pandas 'NaN' 值是 'NULL' 值。

NaN 值如何不被 '0' 替换

【问题讨论】:

    标签: python pandas dataframe dask nan


    【解决方案1】:

    通过使用df = dataframe.fillna(0),您指示在所有列中填充 nan,这可能会出现问题。明确指定带有 nans 的列可能会起作用:

    df = dataframe.copy()
    df["type_id"] = df["type_id"].astype('float').fillna(0)
    

    另一种选择是尝试dd.to_numeric

    df["type_id"] = dd.to_numeric(df["type_id"], errors="coerce").fillna(0)
    

    【讨论】:

    • 我试过了,但不幸的是我遇到了同样的错误
    • 还是同样的错误
    • 再试一次...
    • 该死,也不工作:/
    猜你喜欢
    • 2017-02-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-12-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多