【发布时间】:2021-11-26 20:26:54
【问题描述】:
我正在尝试使用以下代码将 dask 数据帧转换为 pandas 数据帧:
import dask.dataframe as dd
uri = "mysql+pymysql://myUser:myPassword@myHost:myPort/myDatabase"
dataframe = dd.read_sql_table("myTable", uri, "id", columns=["id", "name", "type_id"])
df = dataframe.fillna(0)
print(len(df.index))
但是我遇到了以下错误:
Traceback (most recent call last):
File "tmp.py", line 5, in <module>
print(len(df.index))
File "/home/user/.local/lib/python3.7/site-packages/dask/dataframe/core.py", line 593, in __len__
len, np.sum, token="len", meta=int, split_every=False
File "/home/user/.local/lib/python3.7/site-packages/dask/base.py", line 288, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/dask/base.py", line 570, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/dask/threaded.py", line 87, in get
**kwargs
File "/home/user/.local/lib/python3.7/site-packages/dask/local.py", line 517, in get_async
raise_exception(exc, tb)
File "/home/user/.local/lib/python3.7/site-packages/dask/local.py", line 325, in reraise
raise exc
File "/home/user/.local/lib/python3.7/site-packages/dask/local.py", line 223, in execute_task
result = _execute_task(task, data)
File "/home/user/.local/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/home/user/.local/lib/python3.7/site-packages/dask/utils.py", line 35, in apply
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/dask/dataframe/io/sql.py", line 232, in _read_sql_chunk
return df.astype(meta.dtypes.to_dict(), copy=False)
File "/home/user/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5683, in astype
col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
File "/home/user/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5698, in astype
new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/user/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 582, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/user/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 442, in apply
applied = getattr(b, f)(**kwargs)
File "/home/user/.local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 625, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/home/user/.local/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 868, in astype_nansafe
raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer
我正在使用的表具有以下结构(仅使用 pandas 检索):
id name type_id
-------------------------
2 name_2 3.0
3 name_3 3.0
4 name_4 1.0
6 name_6 NaN
7 name_7 2.0
...
我尝试了相同的代码,但没有检索“type_id”列,它按预期工作。
我不明白为什么 NaN 值没有被“0”替换,因为我在尝试转换数据帧之前使用了 fillna(0) 函数。
如果我用 phpmyadmin 查看我的数据库,pandas 'NaN' 值是 'NULL' 值。
NaN 值如何不被 '0' 替换
【问题讨论】:
标签: python pandas dataframe dask nan