【发布时间】:2021-05-12 22:14:51
【问题描述】:
我正在使用 Mac。在我的 Mac 中,我安装了 Anaconda。我在此使用 Jupiter notebook 6.1.4 来处理数据。出于学习目的,我使用 Kaggle SF Salaries Dataset(https://www.kaggle.com/kaggle/sf-salaries)。 在 Jupyter Notebook 中导入文件并使用命令 df.info() 后,它会显示这样的规范
>>>><class 'pandas.core.frame.DataFrame'>
RangeIndex: 148654 entries, 0 to 148653
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 148654 non-null int64
1 EmployeeName 148654 non-null object
2 JobTitle 148654 non-null object
3 BasePay 148049 non-null object
4 OvertimePay 148654 non-null object
5 OtherPay 148654 non-null object
6 Benefits 112495 non-null object
7 TotalPay 148654 non-null float64
8 TotalPayBenefits 148654 non-null float64
9 Year 148654 non-null int64
10 Notes 0 non-null float64
11 Agency 148654 non-null object
12 Status 38119 non-null object
dtypes: float64(3), int64(2), object(8)
memory usage: 14.7+ MB.
在colab环境下,同一个数据集表现出不同的规格。
>>>>>>>><class 'pandas.core.frame.DataFrame'>
RangeIndex: 116475 entries, 0 to 116474
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 116475 non-null int64
1 EmployeeName 116475 non-null object
2 JobTitle 116475 non-null object
3 BasePay 115870 non-null float64
4 OvertimePay 116474 non-null float64
5 OtherPay 116474 non-null float64
6 Benefits 80315 non-null float64
7 TotalPay 116474 non-null float64
8 TotalPayBenefits 116474 non-null float64
9 Year 116474 non-null float64
10 Notes 0 non-null float64
11 Agency 116474 non-null object
12 Status 5943 non-null object
dtypes: float64(8), int64(1), object(4)
memory usage: 11.6+ MB.
【问题讨论】:
-
请添加有关您如何两次加载数据的代码,以便我们发现差异。
标签: python pandas macos jupyter-notebook google-colaboratory