Pandas Dataframe 对象类型答案

【问题标题】：Pandas Dataframe object typePandas Dataframe 对象类型
【发布时间】：2017-02-14 12:48:34
【问题描述】：

我有一个大型数据框，大约有 100 万行和 9 列，其中一些行在几列中缺少数据。

dat = pd.read_table( 'file path', delimiter = ';')

I        z        Sp   S        B        B/T     r        gf      k
0        0.0303   2    0.606    0.31     0.04    0.23     0.03    0.38   
1        0.0779   2             0.00     0.00    0.05     0.01    0.00

前几列作为字符串读入，最后几列作为NaN 读入，即使那里有一个数值也是如此。当我包含dtype = 'float64' 时，我得到：

ValueError: could not convert string to float:

对解决这个问题有什么帮助吗？

【问题讨论】：

每个值都是浮点数吗？
@Ika8 是的
尝试使用 dtype = object

标签： python pandas dataframe object-type

【解决方案1】：

您可以通过正则表达式使用replace - 将一个或多个whitespaces 转换为NaN，然后转换为float

数据中的空字符串转换为read_table中的NaN。

df = df.replace({'\s+':np.nan}, regex=True).astype(float)
print (df)
     I       z   Sp      S     B   B/T     r    gf     k
0  0.0  0.0303  2.0  0.606  0.31  0.04  0.23  0.03  0.38
1  1.0  0.0779  2.0    NaN  0.00  0.00  0.05  0.01  0.00

如果数据包含一些需要替换为NaN 的字符串，可以使用to_numeric 和apply：

df = df.apply(lambda x: pd.to_numeric(x, errors='coerce'))
print (df)
   I       z  Sp      S     B   B/T     r    gf     k
0  0  0.0303   2  0.606  0.31  0.04  0.23  0.03  0.38
1  1  0.0779   2    NaN  0.00  0.00  0.05  0.01  0.00

【讨论】：

当最后三列都有值时，它们都被读取为 NaN。不过它适用于前 6 个。
您使用df.replace({'\s+':np.nan}, regex=True).astype(float) 还是to_numeric 解决方案？最后 3 列中的数据是数字吗？
第一个解决方案不起作用，第二个解决方案是为最后三列返回 NaN，它们是数字。
好的，在最后 3 列中是否有一些数字以空格开头或结尾？
如果是，这应该可以df = df.apply(lambda x: pd.to_numeric(x.astype(str).str.strip(), errors='coerce'))