【问题标题】:selecting row with multi index选择具有多索引的行
【发布时间】:2016-10-07 09:03:00
【问题描述】:

我有以下数据库,如何按零售商名称、访问编号、访问日期选择一行?

category_name                                                     Blades & Razors & Foam  Diaper
    retailer_name store_number visit_date                                         
    ABest         1177         2016-03-01 06:10:49                      86     191
                               2016-03-24 08:59:33                     129     222
                               2016-03-29 04:34:36                     114     323
                               2016-04-12 10:56:26                     225     235
                  1182         2016-03-02 08:54:00                     161     217

例如,我想获取'ABest'11822016-03-02 08:54:00 的行

谢谢

【问题讨论】:

  • 你真的应该使用 pandas 吗?

标签: python pandas dataframe multi-index


【解决方案1】:

我认为你可以使用locslicers - docs

import pandas as pd

df = pd.DataFrame({'Blades & Razors & Foam': {('ABest', 1177, pd.Timestamp('2016-03-01 06:10:49')): 86, ('ABest', 1177, pd.Timestamp('2016-03-29 04:34:36')): 114, ('ABest', 1177, pd.Timestamp('2016-03-24 08:59:33')): 129, ('ABest', 1177, pd.Timestamp('2016-04-12 10:56:26')): 225, ('ABest', 1182, pd.Timestamp('2016-03-02 08:54:00')): 161}, 'Diaper': {('ABest', 1177, pd.Timestamp('2016-03-01 06:10:49')): 191, ('ABest', 1177, pd.Timestamp('2016-03-29 04:34:36')): 323, ('ABest', 1177, pd.Timestamp('2016-03-24 08:59:33')): 222, ('ABest', 1177, pd.Timestamp('2016-04-12 10:56:26')): 235, ('ABest', 1182, pd.Timestamp('2016-03-02 08:54:00')): 217}})
df.columns.names=[u'category_name'] 
df.index.names=[u'retailer_name', u'store_number', u'visit_date'] 
print (df)
category_name                                   Blades & Razors & Foam  Diaper
retailer_name store_number visit_date                                         
ABest         1177         2016-03-01 06:10:49                      86     191
                           2016-03-24 08:59:33                     129     222
                           2016-03-29 04:34:36                     114     323
                           2016-04-12 10:56:26                     225     235
              1182         2016-03-02 08:54:00                     161     217

idx = pd.IndexSlice
print (df.loc[idx['ABest',1182,'2016-03-02 08:54:00'],:])
category_name                                   Blades & Razors & Foam  Diaper
retailer_name store_number visit_date                                         
ABest         1182         2016-03-02 08:54:00                     161     217

编辑:

如果您需要更改级别visit_dateto_datetime,您可以使用:

import pandas as pd

df = pd.DataFrame({'Blades & Razors & Foam': {('ABest', 1182, '2016-03-02 08:54:00'): 161, ('ABest', 1177, '2016-04-12 10:56:26'): 225, ('ABest', 1177, '2016-03-01 06:10:49'): 86, ('ABest', 1177, '2016-03-24 08:59:33'): 129, ('ABest', 1177, '2016-03-29 04:34:36'): 114}, 'Diaper': {('ABest', 1182, '2016-03-02 08:54:00'): 217, ('ABest', 1177, '2016-04-12 10:56:26'): 235, ('ABest', 1177, '2016-03-01 06:10:49'): 191, ('ABest', 1177, '2016-03-24 08:59:33'): 222, ('ABest', 1177, '2016-03-29 04:34:36'): 323}})
df.index.names=[u'retailer_name', u'store_number', u'visit_date'] 
print (df)
                                                Blades & Razors & Foam  Diaper
retailer_name store_number visit_date                                         
ABest         1177         2016-03-01 06:10:49                      86     191
                           2016-03-24 08:59:33                     129     222
                           2016-03-29 04:34:36                     114     323
                           2016-04-12 10:56:26                     225     235
              1182         2016-03-02 08:54:00                     161     217

df.reset_index(inplace=True)
df['visit_date'] = pd.to_datetime(df['visit_date'])
df.set_index(['retailer_name','store_number','visit_date'], inplace=True)


idx = pd.IndexSlice
print (df.loc[idx['ABest',1182,'2016-03-02 08:54:00'],:])
                                                Blades & Razors & Foam  Diaper
retailer_name store_number visit_date                                         
ABest         1182         2016-03-02 08:54:00                     161     217

【讨论】:

  • 我收到 KeyError: ('ABest', 1182, Timestamp('2016-03-02 08:54:00'))
  • 嗯,我没有日期时间索引,请稍等。
  • 有趣,对我来说效果很好。你的熊猫版本是什么? print pd.__version__
  • 但是真实数据有问题吗?
  • 顺便说一句,对我来说也可以print (df.loc[('ABest', 1182, '2016-03-02 08:54:00'),:]),但我不知道这是否是最佳实践解决方案。
猜你喜欢
  • 2017-04-28
  • 2020-04-27
  • 1970-01-01
  • 1970-01-01
  • 2020-03-26
  • 2017-05-19
  • 2012-11-26
  • 2017-04-01
  • 2020-08-18
相关资源
最近更新 更多