【发布时间】:2018-05-18 00:21:03
【问题描述】:
我有两个相同的数据框(唯一的区别是列的名称 - 索引和值匹配)
df1
Out[300]:
C1 2018-05-17 P1 2018-05-17
Symbol YYYY MM DD Strike
AA 2018 05 18 29.0 0 0
30.0 0 0
df2
Out[301]:
C 2018-05-17 P 2018-05-17
Symbol YYYY MM DD Strike
AA 2018 05 18 29.0 0 0
30.0 0 0
当我尝试加入它们时,pandas 与索引不匹配
df1.join(df2,how='outer')
Out[302]:
C1 2018-05-17 P1 2018-05-17 C 2018-05-17 P 2018-05-17
Symbol YYYY MM DD Strike
AA 2018 05 18 29.0 0 0 NaN NaN
30.0 0 0 NaN NaN
29.0 NaN NaN 0 0
30.0 NaN NaN 0 0
似乎“罢工”没有被识别为匹配。我怎样才能找出这里的区别?
df1.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 2 entries, (AA, 2018, 05, 18, 29.0) to (AA, 2018, 05, 18, 30.0)
Data columns (total 2 columns):
C1 2018-05-17 2 non-null object
P1 2018-05-17 2 non-null object
dtypes: object(2)
memory usage: 48.3+ KB
df2.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 2 entries, (AA, 2018, 05, 18, 29.0) to (AA, 2018, 05, 18, 30.0)
Data columns (total 2 columns):
C 2018-05-17 2 non-null object
P 2018-05-17 2 non-null object
dtypes: object(2)
memory usage: 7.5+ KB
更新:
我发现 Strike 列之一是 float 类型
df1 = df1.reset_index()
df2 = df2.reset_index()
df1.dtypes
Out[346]:
Symbol object
YYYY object
MM object
DD object
Strike float64
C1 2018-05-17 object
P1 2018-05-17 object
dtype: object
df2.dtypes
Out[347]:
Symbol object
YYYY object
MM object
DD object
Strike object
C 2018-05-17 object
P 2018-05-17 object
dtype: object
但是,即使我将 dtype 更改为 object
df1 = df1.reset_index()
df1.Strike = df1.Strike.astype('object')
df1.dtypes
Out[360]:
level_0 int64
index object
Symbol object
YYYY object
MM object
DD object
Strike object
C1 2018-05-17 object
P1 2018-05-17 object
dtype: object
如果我将它设置回索引,它会变回浮动
df1.set_index(['Symbol','YYYY','MM','DD','Strike']).reset_index().dtypes
Out[373]:
Symbol object
YYYY object
MM object
DD object
Strike float64
C1 2018-05-17 object
P1 2018-05-17 object
dtype: object
如何阻止它变回来?
【问题讨论】: