【发布时间】:2014-10-21 05:16:41
【问题描述】:
我正在尝试利用 Pandas 中新增的功能来使用切片器访问多索引,但我在处理看似相当简单的切片问题时遇到了一些麻烦,所以我只想通过在这里分组寻求帮助。
以下是代码示例,其中包含一些有效的示例和一些无效的示例:
import pandas as pd
# Displays: '0.14.1'
pd.__version__
df = pd.DataFrame({'A': ['A0'] * 5 + ['A1']*5 + ['A2']*5,
'B': ['B0','B0','B1','B1','B2'] * 3,
'DATE': ["2013-06-11",
"2013-07-02",
"2013-07-09",
"2013-07-30",
"2013-08-06",
"2013-06-11",
"2013-07-02",
"2013-07-09",
"2013-07-30",
"2013-08-06",
"2013-09-03",
"2013-10-01",
"2013-07-09",
"2013-08-06",
"2013-09-03"],
'VALUES': [22, 35, 14, 9, 4, 40, 18, 4, 2, 5, 1, 2, 3,4, 2]})
df.DATE = df['DATE'].apply(lambda x: pd.to_datetime(x))
df1 = df.set_index(['A', 'B', 'DATE'])
df1 = df1.sortlevel()
df2 = df.set_index('DATE')
# A1 - Works - Get all values under "A0" and "A1"
df1.loc[(slice('A1')),:]
# A2 - Works - Get all values from the start to "A2"
df1.loc[(slice('A2')),:]
# A3 - Works - Get all values under "B1" or "B2"
df1.loc[(slice(None),slice('B1','B2')),:]
# A4 - Works - Get all values between 2013-07-02 and 2013-07-09
df1.loc[(slice(None),slice(None),slice('20130702','20130709')),:]
##############################################
# These do not work and I'm wondering why... #
##############################################
# B1 - Does not work - Get all values in B0 that are also under A0, A1 and A2
df1.loc[(slice('A2'),slice('B0')),:]
# B2 - Does not work - Get all values in B0, B1 and B2 (similar to what #2 is doing for the As)
df1.loc[(slice(None),slice('B2')),:]
# B3 - Does not work - Get all values from B1 to B2 and up to 2013-08-06
df1.loc[(slice(None),slice('B1','B2'),slice('2013-08-06')),:]
# B4 - Does not work - Same as A4 but the start of the date slice is not a key.
# Would have thought the behavior would be similar to something like df2['20130701':]
# In other words, date indexing allowed starting on non-key points
df1.loc[(slice(None),slice(None),slice('20130701','20130709')),:]
虽然肯定有其他更简单的方法来获取数据,但我希望能够回答下面的具体示例问题,以便能够将知识用作构建块来进行更复杂的多索引切片在路上。
提前感谢您的帮助!
【问题讨论】:
-
请显示熊猫版本; 0.14.1 修复了 0.14.0 中针对多索引切片器的几个错误(并掌握了更多)
-
您可以考虑使用
idx = pd.IndexSlice语法来使这些更易于阅读。 pandas.pydata.org/pandas-docs/stable/… -
您还需要 .sortlevel() 或确保多级不起作用 - 它可能没有提升(并且即使未排序也尝试工作 - 这可能是一个错误)
-
抱歉 - 刚刚添加了几行代码来显示版本并对数据框进行排序。我认为即使进行排序(版本为 0.14.1),我仍然会遇到相同的错误。只是好奇,示例代码对您有用吗 Jeff?感谢 chrisb 的建议,现在也会看看。
-
明天我得看看这些更详细的信息 - 会告诉你
标签: python pandas slice multi-index