【问题标题】:Advanced Pandas: Create Multi-Index panda dataframe from certain row indices高级熊猫:从某些行索引创建多索引熊猫数据框
【发布时间】:2018-08-31 21:00:47
【问题描述】:

我有一个这样的数据集:

    m   n   o  
0   2   22  42  
1   3   23  43  
2   4   24  44  
3   5   25  45  
4   6   26  46  
5   7   27  47  
6   8   28  48  
7   9   29  49  
8   10  30  50  
9   11  31  51  

我们如何将其转换为多索引数据框:

Index   m    n  o  
  A  
     0  2   22  42  
     1  3   23  43  
     2  4   24  44  
  B  
    4   6   26  46  
  C  
    6   8   28  48  
    7   9   29  49  
    8   10  30  50 

** 我的尝试**

import numpy as np 
import pandas as pd

df = pd.DataFrame({'m': np.arange(2,12),
                   'n': np.arange(22,32),
                  'o': np.arange(42,52)})

df

** groupby 方法**

# Required index and their names
idx = [3,5,9]  # A is 0,1,2 B is 4 and C is 6,7,8
idx_orig = idx.copy()
idx_names = ['A','B','C']

# Attempt
idx_diff = np.diff(idx)
idx_diff = np.hstack((idx[0]+1,idx_diff)) # Add the first index value
idx_diff = idx_diff - 1 # Decrease index number
idx_names = np.repeat(idx_names,idx_diff)

# Drop rows with given indices
df = df.drop(df.index[idx_orig])

# Assign new col
df['Names'] = idx_names
#df.groupby('Names').count()
df

输出

    m   n   o   Names
0   2   22  42  A
1   3   23  43  A
2   4   24  44  A
4   6   26  46  B
6   8   28  48  C
7   9   29  49  C
8   10  30  50  C

在这里,我想获得一个包含所有行的多索引数据框,但是,groupby 只给出计数。

【问题讨论】:

    标签: python pandas multi-index


    【解决方案1】:

    用途:

    idx = [3,5,9] 
    
    idx_names = ['A','B','C']
    d = dict(enumerate(idx_names))
    
    #get boolean mask for create cumulative sum and filter out idx rows
    mask = df.index.isin(idx)
    df['g'] = mask.cumsum()
    #map by dictioanry
    df['g'] = df['g'].map(d)
    #create MultiIndex and change order of levels
    df = df[~mask].set_index('g', append=True).swaplevel(0,1)
    print (df)
          m   n   o
    g              
    A 0   2  22  42
      1   3  23  43
      2   4  24  44
    B 4   6  26  46
    C 6   8  28  48
      7   9  29  49
      8  10  30  50
    

    【讨论】:

    • 如何获取像 df['A'] 这样的 'A' 的值?
    • @astro123 - 使用df.loc['A']
    • print (df.xs('A')) - xs
    • .loc.xs 哪个更好? .xs 是否已弃用?
    • @astro123 - 不,是ix - ix-indexer-is-deprecated
    猜你喜欢
    • 2016-01-13
    • 2017-05-01
    • 1970-01-01
    • 1970-01-01
    • 2022-07-10
    • 2016-06-13
    • 1970-01-01
    • 2016-06-16
    • 2021-11-11
    相关资源
    最近更新 更多