Pandas 将数据框除以索引值答案

【问题标题】：Pandas Divide dataframe by index valuesPandas 将数据框除以索引值
【发布时间】：2016-12-10 09:42:55
【问题描述】：

我正在尝试将数据框中的所有列除以索引。（1221 行，1000 列）

           5000058004097  5000058022936  5000058036940  5000058036827  \

91.0        3.667246e+10   3.731947e+12   2.792220e+14   2.691262e+13   
94.0        9.869027e+10   1.004314e+13   7.514220e+14   7.242529e+13   
96.0        2.536914e+11   2.581673e+13   1.931592e+15   1.861752e+14
...

这是我尝试过的代码...

A = SHIGH.divide(SHIGH.index, axis =1)

我得到这个错误：

ValueError: operands could not be broadcast together with shapes (1221,1000) (1221,)

我也试过

A = SHIGH.divide(SHIGH.index.values.tolist(), axis =1)

并且还重新索引并使用列来划分并得到相同的错误。

如果有人能指出我的错误，将不胜感激。

【问题讨论】：

标签： python pandas indexing dataframe

【解决方案1】：

您需要将Index 对象转换为Series：

df.div(df.index.to_series(), axis=0)

例子：

In [118]:
df = pd.DataFrame(np.random.randn(5,3))
df

Out[118]:
          0         1         2
0  0.828540 -0.574005 -0.535122
1 -0.126242  2.152599 -1.356933
2  0.289270 -0.663178 -0.374691
3 -0.016866 -0.760110 -1.696402
4  0.130580 -1.043561  0.789491

In [124]:
df.div(df.index.to_series(), axis=0)

Out[124]:
          0         1         2
0       inf      -inf      -inf
1 -0.126242  2.152599 -1.356933
2  0.144635 -0.331589 -0.187345
3 -0.005622 -0.253370 -0.565467
4  0.032645 -0.260890  0.197373

【讨论】：

【解决方案2】：

另一种方法是

df.div(df.index.values, axis=0)

例子：

In [7]: df = pd.DataFrame({'a': range(5), 'b': range(1, 6), 'c': range(2, 7)}).set_index('a')

In [8]: df.divide(df.index.values, axis=0)
Out[8]: 
          b         c
a                    
0       inf       inf
1  2.000000  3.000000
2  1.500000  2.000000
3  1.333333  1.666667
4  1.250000  1.500000

【讨论】：

【解决方案3】：

您需要转换索引to_series，然后除以div：

print (SHIGH.divide(SHIGH.index.to_series(), axis = 0))
      5000058004097  5000058022936  5000058036940  5000058036827
91.0   4.029941e+08   4.101041e+10   3.068374e+12   2.957431e+11
94.0   1.049896e+09   1.068419e+11   7.993851e+12   7.704818e+11
96.0   2.642619e+09   2.689243e+11   2.012075e+13   1.939325e+12

在两种解决方案中timings 是相同的：

SHIGH = pd.DataFrame({'5000058022936': {96.0: 25816730000000.0, 91.0: 3731947000000.0, 94.0: 10043140000000.0}, 
                 '5000058036940': {96.0: 1931592000000000.0, 91.0: 279222000000000.0, 94.0: 751422000000000.0}, 
                 '5000058036827': {96.0: 186175200000000.0, 91.0: 26912620000000.0, 94.0: 72425290000000.0}, 
                 '5000058004097': {96.0: 253691400000.0, 91.0: 36672460000.0, 94.0: 98690270000.0}})


print (SHIGH)
      5000058004097  5000058022936  5000058036827  5000058036940
91.0   3.667246e+10   3.731947e+12   2.691262e+13   2.792220e+14
94.0   9.869027e+10   1.004314e+13   7.242529e+13   7.514220e+14
96.0   2.536914e+11   2.581673e+13   1.861752e+14   1.931592e+15

#[1200 rows x 1000 columns] in sample DataFrame
SHIGH = pd.concat([SHIGH]*400).reset_index(drop=True)
SHIGH = pd.concat([SHIGH]*250, axis=1)

In [212]: %timeit (SHIGH.divide(SHIGH.index.values, axis = 0))
100 loops, best of 3: 14.8 ms per loop

In [213]: %timeit (SHIGH.divide(SHIGH.index.to_series(), axis = 0))
100 loops, best of 3: 14.9 ms per loop

【讨论】：

【解决方案4】：

SHIGH / SHIGH.index

df.index 为您提供了一个类似数组的结构来存储索引。

【讨论】：