使用 pandas 根据投资组合的股价调整权重答案

【问题标题】：Adjusting weights based on share prices for a investment portfolio using pandas使用 pandas 根据投资组合的股价调整权重
【发布时间】：2017-03-08 07:35:53
【问题描述】：

我有一个投资组合中公司的股价，我的目标是创建新列df['Final_weights']，同时保持df['weights'] 和df['final_weights'] 的每个日期和类别的权重总和相同。

我想为某一天股价在后 30% 的公司与同一类别的公司赋予 0 权重，我想为股价超过 70% 的公司赋予更高的权重特定日期与同一类别的公司的百分比。

我有一个包含多个日期和公司的数据框：

例如df的子集：

 Date       Category    Company    Price    weight
1/1/2007    Automative  Audi        1000    0.146
1/1/2007    Automative  Alfa Romeo  400     0.143
1/1/2007    Automative  Aston Martin500     0.002
1/1/2007    Automative  Bentley     2000    0.025
1/1/2007    Automative  Mercedes    3000    0.063
1/1/2007    Automative  BMW          40     0.154
1/1/2007    Automative  Volvo       3000    0.163
1/1/2007    Automative  VW           200    0.003
1/1/2007    Technology  Apple        400    0.120
1/1/2007    Technology  Microsoft   5500    0.048
1/1/2007    Technology  Google       230    0.069
1/1/2007    Technology  Lenova        36    0.036
1/1/2007    Technology  IBM          250    0.016
1/1/2007    Technology  Sprint       231    0.013

好的，现在我已经编写了一些代码，它创建了一个新列，给出了每个公司每个日期和每个类别的百分位排名。代码如下所示：

df['Pctile'] = df.Price.groupby([df.index, df.Category]).rank(pct='True')

输出：

            Category       Company  Price  weight    Pctile
Date                                                       
1/1/2007  Automative          Audi   1000   0.146  0.625000
1/1/2007  Automative    Alfa Romeo    400   0.143  0.375000
1/1/2007  Automative  Aston Martin    500   0.002  0.500000
1/1/2007  Automative       Bentley   2000   0.025  0.750000
1/1/2007  Automative      Mercedes   3000   0.063  0.937500
1/1/2007  Automative           BMW     40   0.154  0.125000
1/1/2007  Automative         Volvo   3000   0.163  0.937500
1/1/2007  Automative            VW    200   0.003  0.250000
1/1/2007  Technology         Apple    400   0.120  0.833333
1/1/2007  Technology     Microsoft   5500   0.048  1.000000
1/1/2007  Technology        Google    230   0.069  0.333333
1/1/2007  Technology        Lenova     36   0.036  0.166667
1/1/2007  Technology           IBM    250   0.016  0.666667
1/1/2007  Technology        Sprint    231   0.013  0.500000

现在我想要一个名为df['Final_weight'] 的最后一列。

我想做的就是每个日期和类别就是这 3 件事，

当df['Pctile'] 是<0.3 我想要df['Final_weight'] = 0.
当df['Pctile'] 是>= 0.3 和<= 0.7 然后 df['Final_weight'] = df['weight']。
当df['PCtile'] >0.7 = (weight / sum of weights above 70%pctile) *(sum of weights above 70%pctile + sum of weights below 30%pctile)

以下是一些示例输出和示例计算：

对于1/1/2007 的自动：

1) sum of weights above 70%pctile = 0.251 2)sum of weights below 30%pctile = 0.157

Bentley 的计算 = 0.025 / 0.251 * (0.251 + 0.157) = 0.041

梅赛德斯的计算 = 0.063 / 0.251 * (0.251 + 0.157) = 0.102

沃尔沃的计算 = 0.163 / 0.251 * (0.251 + 0.157) = 0.265

现在1/1/2007 的 Automative 的 weight 和 final_weight 的总和是相同的。它们的总和为 0.699。

1/1/2007 的技术：

1) sum of weights above 70%pctile = 0.168 2)sum of weights below 30%pctile = 0.036

Apple 的计算 = 0.120 / 0.168 * (0.168 + 0.036) = 0.146

Microsoft 的计算 = 0.048 / 0.168 * (0.168 + 0.036) = 0.058

现在1/1/2007 的 Technology 的 weight 和 final_weight 的总和是相同的。它们的总和为 0.302。该日期的总和也仍然为 1。

例如输出：

            Category       Company  Price  weight    Pctile  Final_weight
Date                                                       
1/1/2007  Automative          Audi   1000   0.146  0.625000  0.146
1/1/2007  Automative    Alfa Romeo    400   0.143  0.375000  0.143
1/1/2007  Automative  Aston Martin    500   0.002  0.500000  0.002
1/1/2007  Automative       Bentley   2000   0.025  0.750000  0.041
1/1/2007  Automative      Mercedes   3000   0.063  0.937500  0.102
1/1/2007  Automative           BMW     40   0.154  0.125000  0.000
1/1/2007  Automative         Volvo   3000   0.163  0.937500  0.265
1/1/2007  Automative            VW    200   0.003  0.250000  0
1/1/2007  Technology         Apple    400   0.120  0.833333  0.146
1/1/2007  Technology     Microsoft   5500   0.048  1.000000  0.058
1/1/2007  Technology        Google    230   0.069  0.333333  0.069
1/1/2007  Technology        Lenova     36   0.036  0.166667  0.000
1/1/2007  Technology           IBM    250   0.016  0.666667  0.016
1/1/2007  Technology        Sprint    231   0.013  0.500000  0.013

我的数据很大，有很多类别、日期、公司。希望看到一种有效的编程方式。感谢您的帮助。

【问题讨论】：

标签： python pandas group-by finance trading

【解决方案1】：

虽然我希望这是一个 groupby-of-a-groupby 解决方案，但它不是。这有点肮脏。我无法使用 groupby 解决方案的原因是，据我所知，无法使用 groupby 选择列并将其传递到 multiple argument functions。不能做的事已经够多了……

现在我说它很hacky，所以试试你的数据集。我不知道它在大型数据集上有多快，但请告诉我。

import pandas as pd

#make a lazy example
date = ['1/1/2017']*10
category = ['car']*5 + ['tech']*5
company = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
price = [10, 300, 100, 400, 500, 230, 324, 543, 234, 124]
weight = [0.2, 0.1, 0.3, 0.2, 0.2, 0.15, 0.15, 0.4, 0.1, 0.2]

data = {'date': date, 'category': category, 'company': company, 'price': price, 'weight': weight}
df = pd.DataFrame(data)

#do you percentile thing
df['pctile'] = df.price.groupby([df.date, df.category]).rank(pct='True')

# define a function?
def seventy_thirty(df):
    s = df.ix[df.pctile > 0.7, 'pctile']
    s.ix[:] = 'upper'
    l = df.ix[df.pctile < 0.3, 'pctile']
    l.ix[:] = 'lower'
    s = s.append(l)
    return s

df['pctile_summary'] = seventy_thirty(df)

# created a dataframe with weights the we can merge make into another column
weighted = df.groupby(['date', 'category', 'pctile_summary']).sum().ix[:, ['weight']]

# add lowers onto uppers as we'll need them in final_weights
add_lower = weighted.ix[weighted.index.get_level_values('pctile_summary')=='lower', ['weight']].reset_index(level=2)
add_lower.pctile_summary = 'upper'
add_lower = add_lower.set_index('pctile_summary', append=True)
weighted = pd.merge(weighted, add_lower, how='left', left_index=True, right_index=True, suffixes=['', '_lower'])

# Now add all new columns and calculate the final_weight
df1 = pd.merge(df, weighted.reset_index(), how='left', on=['date', 'category', 'pctile_summary'], suffixes=['', '_sum'])
df1.ix[df1.pctile_summary=='lower', 'final_weight'] = 0
df1.ix[df1.pctile_summary.isnull(), 'final_weight'] = df1.weight
df1.ix[df1.pctile_summary=='upper', 'final_weight'] = (df1.weight / df1.weight_sum) * (df1.weight_sum + df1.weight_lower)

#finally tidy up (delete all that hardwork!)
df1 = df1.drop(['pctile_summary', 'weight_sum', 'weight_lower'], axis=1)
df1

【讨论】：

来晚了，希望明天可以补充
感谢您的回答。是的，请这样做，我无法在更大的真实数据上运行，因为它需要的时间太长。再次感谢。
套装有多大？
800k 行，25 个奇数列。
另外，也许我在日期和类别中的百分比排名没有按应有的方式表现，但不应影响输出结果。谢谢