dataFrame中的数据类型转换答案

【问题标题】：data type conversion in dataFramedataFrame中的数据类型转换
【发布时间】：2021-05-18 15:04:45
【问题描述】：

我有一个csv file，其中有一个名为 population 的列。在此CSV file 中，此列的值显示为十进制（浮点数），即例如12345.00。我已将整个文件转换为ttl RDF format，并且人口文字显示为相同，即 ttl 文件中的 12345.0。我希望它显示为整数（整数），即 12345 - 我需要转换此列的数据类型还是要做什么？另外，我想问一下如何在python中检查dataFrame列的数据类型？（python 初学者）- 谢谢

【问题讨论】：

当您阅读该列时，您可以在pd.read_csv 方法中使用字典指定您想要的dtype：dtype={'your_col_name': 'int64'}。但是，如果您的数据混乱，这将失败（可能缺少数据），因此您可能需要做其他事情来强制错误值，然后尝试不同的类型

标签： python pandas dataframe type-conversion

【解决方案1】：

您可以先尝试更改列数据类型。例如

df = pd.DataFrame([1.0,2.0,3.0,4.0], columns=['A'])

    A
0  1.0
1  2.0
2  3.0
3  4.0
Name: A, dtype: float64

现在

df['A'] = df['A'].astype(int)

    A
0  1.0
1  2.0
2  3.0
3  4.0
Name: A, dtype: int32

如果你在专栏中有一些 np.NaN 你可以试试

df = df.astype('Int64')

这会得到你

其中是与 np.NaN 等效的 Int64。 重要的是要知道 np:NaN 是一个浮点数并且还没有被广泛使用并且没有优化内存和性能，你可以阅读更多关于这里 https://pandas.pydata.org/docs/user_guide/missing_data.html#integer-dtypes-and-missing-data

【讨论】：

@Allolz 和@Evert Acosta -- 我试过csv_data = pd.read_csv("~/Documents/the_file.csv") 'csv_data['theColName'] = csv_data['theColName'].astype(int)` 但它生成的错误为ValueError: Cannot convert non-finite values (NA or inf) to integer ，我认为这是由于存在一些缺失值的原因——另外，我不想删除这些缺失值，因为它们对我很重要。

【解决方案2】：

csv_data['theColName'] = csv_data['theColName'].fillna(0) csv_data['theColName'] = csv_data['theColName'].astype('int64') 工作并且该列已成功转换为 int64。谢谢大家

【讨论】：