【问题标题】:Pandas Divide dataframe by index valuesPandas 将数据框除以索引值
【发布时间】:2016-12-10 09:42:55
【问题描述】:

我正在尝试将数据框中的所有列除以索引。(1221 行,1000 列)

           5000058004097  5000058022936  5000058036940  5000058036827  \

91.0        3.667246e+10   3.731947e+12   2.792220e+14   2.691262e+13   
94.0        9.869027e+10   1.004314e+13   7.514220e+14   7.242529e+13   
96.0        2.536914e+11   2.581673e+13   1.931592e+15   1.861752e+14
...

这是我尝试过的代码...

A = SHIGH.divide(SHIGH.index, axis =1) 

我得到这个错误:

ValueError: operands could not be broadcast together with shapes (1221,1000) (1221,) 

我也试过

A = SHIGH.divide(SHIGH.index.values.tolist(), axis =1)

并且还重新索引并使用列来划分并得到相同的错误。

如果有人能指出我的错误,将不胜感激。

【问题讨论】:

    标签: python pandas indexing dataframe


    【解决方案1】:

    您需要将Index 对象转换为Series

    df.div(df.index.to_series(), axis=0)
    

    例子:

    In [118]:
    df = pd.DataFrame(np.random.randn(5,3))
    df
    
    Out[118]:
              0         1         2
    0  0.828540 -0.574005 -0.535122
    1 -0.126242  2.152599 -1.356933
    2  0.289270 -0.663178 -0.374691
    3 -0.016866 -0.760110 -1.696402
    4  0.130580 -1.043561  0.789491
    
    In [124]:
    df.div(df.index.to_series(), axis=0)
    
    Out[124]:
              0         1         2
    0       inf      -inf      -inf
    1 -0.126242  2.152599 -1.356933
    2  0.144635 -0.331589 -0.187345
    3 -0.005622 -0.253370 -0.565467
    4  0.032645 -0.260890  0.197373
    

    【讨论】:

      【解决方案2】:

      另一种方法是

      df.div(df.index.values, axis=0)
      

      例子:

      In [7]: df = pd.DataFrame({'a': range(5), 'b': range(1, 6), 'c': range(2, 7)}).set_index('a')
      
      In [8]: df.divide(df.index.values, axis=0)
      Out[8]: 
                b         c
      a                    
      0       inf       inf
      1  2.000000  3.000000
      2  1.500000  2.000000
      3  1.333333  1.666667
      4  1.250000  1.500000
      

      【讨论】:

        【解决方案3】:

        您需要转换索引to_series,然后除以div

        print (SHIGH.divide(SHIGH.index.to_series(), axis = 0))
              5000058004097  5000058022936  5000058036940  5000058036827
        91.0   4.029941e+08   4.101041e+10   3.068374e+12   2.957431e+11
        94.0   1.049896e+09   1.068419e+11   7.993851e+12   7.704818e+11
        96.0   2.642619e+09   2.689243e+11   2.012075e+13   1.939325e+12
        

        在两种解决方案中timings 是相同的:

        SHIGH = pd.DataFrame({'5000058022936': {96.0: 25816730000000.0, 91.0: 3731947000000.0, 94.0: 10043140000000.0}, 
                         '5000058036940': {96.0: 1931592000000000.0, 91.0: 279222000000000.0, 94.0: 751422000000000.0}, 
                         '5000058036827': {96.0: 186175200000000.0, 91.0: 26912620000000.0, 94.0: 72425290000000.0}, 
                         '5000058004097': {96.0: 253691400000.0, 91.0: 36672460000.0, 94.0: 98690270000.0}})
        
        
        print (SHIGH)
              5000058004097  5000058022936  5000058036827  5000058036940
        91.0   3.667246e+10   3.731947e+12   2.691262e+13   2.792220e+14
        94.0   9.869027e+10   1.004314e+13   7.242529e+13   7.514220e+14
        96.0   2.536914e+11   2.581673e+13   1.861752e+14   1.931592e+15
        
        #[1200 rows x 1000 columns] in sample DataFrame
        SHIGH = pd.concat([SHIGH]*400).reset_index(drop=True)
        SHIGH = pd.concat([SHIGH]*250, axis=1)
        
        In [212]: %timeit (SHIGH.divide(SHIGH.index.values, axis = 0))
        100 loops, best of 3: 14.8 ms per loop
        
        In [213]: %timeit (SHIGH.divide(SHIGH.index.to_series(), axis = 0))
        100 loops, best of 3: 14.9 ms per loop
        

        【讨论】:

          【解决方案4】:
          SHIGH / SHIGH.index
          

          df.index 为您提供了一个类似数组的结构来存储索引。

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 2016-06-01
            • 2020-10-22
            • 1970-01-01
            • 2015-08-11
            • 2016-02-11
            • 1970-01-01
            • 2020-10-08
            • 2021-02-22
            相关资源
            最近更新 更多