【问题标题】:pandas reshape multi key value dataframe colums to rows熊猫将多键值数据框列重塑为行
【发布时间】:2019-08-14 11:53:21
【问题描述】:

我有一个像这样的数据框:

如何将列 (0 => 1, 2=> 3) 存储为记录? IE。列metric_namemetric_value 以及多行(观察)?

pandas_dict = {0: {0: 'Model:',
  1: 'Dependent Variable:',
  2: 'Date:',
  3: 'No. Observations:',
  4: 'Df Model:',
  5: 'Df Residuals:',
  6: 'Converged:',
  7: 'No. Iterations:'},
 1: {0: 'Logit',
  1: 'sick_percentage',
  2: '2019-08-14 13:32',
  3: '28',
  4: '2',
  5: '25',
  6: '0.0000',
  7: '35.0000'},
 2: {0: 'Pseudo R-squared:',
  1: 'AIC:',
  2: 'BIC:',
  3: 'Log-Likelihood:',
  4: 'LL-Null:',
  5: 'LLR p-value:',
  6: 'Scale:',
  7: ''},
 3: {0: 'inf',
  1: '6.0798',
  2: '10.0764',
  3: '-0.039902',
  4: '0.0000',
  5: '1.0000',
  6: '1.0000',
  7: ''}}
df = pd.DataFrame(pandas_dict)

【问题讨论】:

    标签: python pandas reshape


    【解决方案1】:

    如果只有 4 列,您可以展平值并通过构造函数创建 DataFrame:

    a = df[[0, 2]].values.ravel()
    b = df[[1, 3]].values.ravel()
    
    df = pd.DataFrame({'A':a, 'B':b})
    print (df)
                          A                 B
    0                Model:             Logit
    1     Pseudo R-squared:               inf
    2   Dependent Variable:   sick_percentage
    3                  AIC:            6.0798
    4                 Date:  2019-08-14 13:32
    5                  BIC:           10.0764
    6     No. Observations:                28
    7       Log-Likelihood:         -0.039902
    8             Df Model:                 2
    9              LL-Null:            0.0000
    10        Df Residuals:                25
    11         LLR p-value:            1.0000
    12           Converged:            0.0000
    13               Scale:            1.0000
    14      No. Iterations:           35.0000
    15                                       
    

    或通用解决方案 - 在具有模数和整数除法的列中创建 MultiIndex 并通过 DataFrame.stack 重塑:

    df.columns = [df.columns % 2, df.columns // 2]
    df = df.stack().reset_index(drop=True)
    print (df)
                          0                 1
    0                Model:             Logit
    1     Pseudo R-squared:               inf
    2   Dependent Variable:   sick_percentage
    3                  AIC:            6.0798
    4                 Date:  2019-08-14 13:32
    5                  BIC:           10.0764
    6     No. Observations:                28
    7       Log-Likelihood:         -0.039902
    8             Df Model:                 2
    9              LL-Null:            0.0000
    10        Df Residuals:                25
    11         LLR p-value:            1.0000
    12           Converged:            0.0000
    13               Scale:            1.0000
    14      No. Iterations:           35.0000
    15                                       
    

    【讨论】:

      【解决方案2】:

      如果您正在寻找更快的解决方案,也可以在此处使用np.concatenate

      df=pd.DataFrame(np.concatenate( (df.iloc[:,[0,1]].values, df.iloc[:,[2,3]].values), axis=0 ),columns=['Metric Name','Metric Value'])
      

      如果你想使用pandas功能,可以使用

      1) pandas.DataFrame.merge

      df= df.iloc[:,[0,1]].rename(columns={0:'Metric Name',1:'Metric Value'}).merge(df.iloc[:,[2,3]].rename(columns={2:'Metric Name',3:'Metric Value'}),how='outer')
      

      2) pandas.concat

      df=pd.concat((df.iloc[:,[0,1]].rename(columns={0:'Metric Name',1:'Metric Value'}),df.iloc[:,[2,3]].rename(columns={2:'Metric Name',3:'Metric Value'})), ignore_index=True)
      

      【讨论】:

        猜你喜欢
        • 2019-06-30
        • 1970-01-01
        • 1970-01-01
        • 2020-12-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-04-26
        • 1970-01-01
        相关资源
        最近更新 更多