DataFrame对象类型列到int或float错误答案

【问题标题】：DataFrame object type column to int or float errorDataFrame对象类型列到int或float错误
【发布时间】：2020-04-13 04:52:02
【问题描述】：

我有以下数据框

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
Borough        20 non-null object
Indian         20 non-null object
Pakistani      20 non-null object
Bangladeshi    20 non-null object
Chinese        20 non-null object
Other_Asian    20 non-null object
Total_Asian    20 non-null object
dtypes: object(7)

只有 'Borough' 列是字符串，其他应该是 int 或 float。我正在尝试使用 astype(int) 进行转换。我已经尝试了互联网上提到的所有选项，但仍然出现错误。

df_LondonEthnicity['Indian'] = df_LondonEthnicity['Indian'].astype(int)

错误是：

以 10 为底的 int() 的无效文字：

我也试过了

df_LondonEthnicity['Indian'] = df_LondonEthnicity.astype({'Indian': int}).dtypes

我也试过

cols = ['Indian', 'Pakistani', 'Bangladeshi', 'Chinese', 'Other_Asian', 'Total_Asian']  

for col in cols:  # Iterate over chosen columns
  df_LondonEthnicity[col] = pd.to_numeric(df_LondonEthnicity[col])

还尝试将得到的字符串转换为浮点数

我很感激这方面的一些帮助。谢谢

【问题讨论】：

能否提供df_LondonEthnicity的样本数据？
stackoverflow.com/questions/18434208/…。 pd.to_numeric 是正确的转换工具。如果 pandas 没有自动将列解析为数字 dtype，那么 astype 将无济于事。 to_numeric 允许您将所有非数字数据转换为NaN
df_LondonEthnicity['Indian'] 中有什么？如果有空字符串或带有文本而不是数字的字符串，则无法转换它

标签： python pandas dataframe

【解决方案1】：

正如 cmets 中所指出的，您需要使用 to_numeric 函数。

错误的意思是您尝试转换的值包含0-9 (base10) 以外的字符。

因此，您可以选择使用 pd.to_numeric 并将所有不符合要求的值设为 NaN 或以某种方式对其进行转换。

假设你有一个这样的数据框。

使用pd.to_numeric 会得到这样的输出。但值是浮点数。

>>> pd.to_numeric(df.X, errors='coerce')
0    123.0
1      NaN
2    200.0
3    200.1
Name: X, dtype: float64

其他选择是像这样转换它。

>>> df.X.str.extract(r'([\d]+)').astype(int)
     0
0  123
1  123
2  200
3  200

【讨论】：

谢谢。除了“Borough”之外的所有列都包含数字，因此应该是 int 或 float，以便我可以进行一些数学计算。我正在抓取网页的信息。 url = 'en.wikipedia.org/wiki/Ethnic_groups_in_London' # 连接到 URL response = requests.get(url) # 解析 HTML 并保存到 BeautifulSoup 对象 soup = BeautifulSoup(response.text, "html.parser") # 这是一个漂亮的汤对象，它有整个 HTML text parsed table = soup.find_all('table')[4] 我试过你的方法，它把我所有的数据都转换成 NaN。
知道了。我的号码格式为 12,456。我已经删除了逗号，现在它可以工作了。感谢您的帮助。