【发布时间】:2021-09-27 02:22:30
【问题描述】:
我目前正在尝试为股票收益的主成分分析构建滚动窗口时间序列,以构建回测。我想看看加班设置各个资产的权重是否比买入并持有的投资组合表现更好(更高的回报)。问题在于为与 PCA 对应的组件权重构建时间序列非常困难。我想出了一些修复方法,但似乎无法为这些数据建立时间序列。我也在努力用日期时间序列替换字典中的键值。环顾四周,在堆栈溢出方面尝试最多,但无济于事。
下面的代码是我想出的:
import numpy as np
np.random.seed(42)
values_for_df = []
for i in range(1,6):
random_numbers = np.random.random(size=60)
values_for_df.append(random_numbers)
df = pd.DataFrame(values_for_df).T
weights = {}
dates_1 = {}
for i in range(1, len(df)):
pca = PCA()
transf = pca.fit_transform(df.iloc[i:i+2])
weights[i] = pca.components_
dates_1[i] = df.iloc[i].name
输出是列表列表的字典。如前所述,我很难使用pd.DataFrame() 和pd.concat() 将其转换为df。
无论如何要将其转换为数据帧,其中两个 PCA 组件权重行对应一个日期时间?
这段代码的输出如下所示:
{1: array([[ 0.50938649, 0.1163777 , -0.56213712, -0.5999693 , -0.2258768 ],
[-0.19623229, 0.68084356, 0.17894347, 0.05397575, -0.68044896]]),
2: array([[ 0.76101188, -0.39708989, -0.35225525, -0.01460473, -0.37267074],
[ 0.26603362, -0.44559758, 0.81939468, 0.00324688, 0.24341472]]),
3: array([[ 0.43735771, 0.07284643, -0.23807945, 0.46456192, -0.72863711],
[-0.84990214, -0.03839851, 0.14762177, 0.40008466, -0.30713514]]),
4: array([[-0.10002177, -0.12908589, 0.09697811, -0.54718954, 0.81517565],
[ 0.9291778 , 0.24487735, -0.15042811, -0.23197187, 0.01497085]]),
5: array([[ 0.43260558, -0.17245194, -0.15363331, 0.64845393, -0.58225171],
[-0.8998753 , -0.03170306, -0.04644508, 0.3319358 , -0.27727395]]),
6: array([[-0.66851419, 0.31545065, -0.26741055, -0.54749379, 0.28691779],
[ 0.3598592 , -0.05698951, -0.01176088, 0.02137245, 0.93094492]]),
7: array([[ 0.69949617, -0.46121291, 0.26456096, 0.47439289, 0.05428297],
[ 0.0671515 , -0.02046416, 0.07459749, -0.04681467, -0.99363751]]),
8: array([[ 0.76526418, -0.23880119, -0.57563869, -0.12170626, -0.10569961],
[-0.20948119, -0.96814145, 0.11706768, -0.02831197, 0.06567612]]),
9: array([[ 0.88308511, -0.18178186, 0.23418943, 0.05558346, 0.35941875],
[ 0.3864688 , -0.20776523, -0.30713553, -0.12458004, -0.83523832]]),
10: array([[ 0.02145911, 0.17212618, -0.34312327, -0.91962789, 0.08039307],
[-0.93784872, 0.14547558, 0.22919403, -0.09705987, -0.19319965]]),
11: array([[-0.28946201, -0.26603042, 0.62500451, -0.66932375, 0.08255082],
[-0.79432192, -0.0826848 , 0.20253363, 0.56666393, 0.0093821 ]]),
12: array([[ 0.4225668 , 0.63454067, -0.52748616, 0.37344672, -0.0330355 ],
[ 0.89717194, -0.19965603, 0.28582373, -0.27012438, 0.02361333]]),
13: array([[-0.09152907, 0.18236668, -0.43896889, 0.65056049, -0.5851856 ],
[ 0.19225542, 0.02507023, -0.12112356, -0.68443942, -0.69230131]]),
14: array([[ 0.52763656, 0.65909855, -0.10621454, -0.26420703, 0.45398444],
[-0.20903038, -0.39874697, 0.03275961, 0.0985442 , 0.88686132]]),
15: array([[-0.6376942 , -0.65434659, 0.23591625, -0.20141987, 0.26258372],
[-0.26207514, -0.31149866, -0.55752568, 0.44580663, -0.56983048]]),
16: array([[ 0.27907902, 0.33000177, -0.37818218, -0.21758258, -0.78920833],
[ 0.49977863, -0.51086522, 0.39388011, -0.57407273, -0.06735727]]),
17: array([[-0.07747888, -0.44363775, 0.72389959, 0.51430407, 0.09296923],
[-0.44632809, -0.37360701, -0.37433274, 0.2591633 , -0.67373468]]),
18: array([[-0.24853706, -0.28143494, -0.09349904, 0.91280228, 0.13066607],
[-0.90863048, -0.25882281, 0.08144532, -0.30767452, -0.07813102]]),
19: array([[-0.0499767 , -0.46808766, 0.81593976, 0.32495903, 0.08390597],
[-0.18009682, 0.19879004, -0.0864013 , 0.65630871, -0.69988667]]),
20: array([[-0.15978936, 0.40505628, 0.23403331, 0.27166524, 0.82572585],
[-0.82190218, -0.11639043, -0.04382051, -0.54840546, 0.09089168]]),
21: array([[-0.59793074, 0.36403396, 0.28523106, -0.56614702, 0.32875356],
[ 0.08787018, -0.09763207, 0.94862929, 0.27707493, -0.07796638]]),
22: array([[-0.04762231, -0.48706884, 0.45248363, 0.37215567, -0.64595262],
[ 0.44614193, 0.47456984, 0.55381454, -0.44821569, -0.26102299]]),
23: array([[-6.14200977e-01, -8.29742681e-02, 1.70228332e-01,
-7.64025699e-01, 5.62092342e-02],
[ 9.85466281e-02, -7.29776513e-01, -4.55585357e-04,
-4.97073250e-02, -6.74717553e-01]]),
在尝试创建 df 时,我得到了这个:
weights_keys weights_values
0 1 [[0.5093864920875057, 0.11637769781544054, -0....
1 2 [[0.7610118804227364, -0.3970898897595845, -0....
2 3 [[0.43735770537072516, 0.07284642654346118, -0...
3 4 [[-0.100021766544103, -0.12908589345836016, 0....
4 5 [[0.43260557607788175, -0.17245193633756645, -...
5 6 [[-0.6685141891902584, 0.3154506469430627, -0....
6 7 [[0.6994961703309339, -0.4612129082876791, 0.2...
7 8 [[0.7652641817892236, -0.23880119387494167, -0...
8 9 [[0.8830851102283364, -0.18178185688401122, 0....
9 10 [[0.02145910731659373, 0.17212617677552292, -0...
10 11 [[-0.28946201366547714, -0.2660304245115253, 0...
11 12 [[0.42256679812505826, 0.6345406677421921, -0....
12 13 [[-0.09152906655393278, 0.1823666758882022, -0...
13 14 [[0.5276365649456491, 0.6590985509896493, -0.1...
14 15 [[-0.6376941956390323, -0.6543465915749572, 0....
15 16 [[0.27907901752772, 0.33000177354673366, -0.37...
16 17 [[-0.07747887772273652, -0.44363774912889514, ...
数据框的示例如下:
USDJPY EURUSD GBPUSD AUDUSD GBPAUD
20210924 21:00:00 Component weights 1 1.618764e-09 -5.137869e-10 -7.915763e-10 -6.841845e-10 4.352906e-10
Component weights 2 -5.137869e-10 1.900899e-09 9.721030e-10 1.872090e-09 -4.564939e-10
Component weights 3 -7.915763e-10 9.721030e-10 3.363203e-09 3.988530e-09 9.450517e-10
Component weights 4 -6.841845e-10 1.872090e-09 3.988530e-09 1.277432e-08 -2.272119e-09
Component weights 5 4.352906e-10 -4.564939e-10 9.450517e-10 -2.272119e-09 7.960307e-09
... ... ... ... ... ... ...
20210924 21:59:00 Component weights 1 1.618764e-09 -5.137869e-10 -7.915763e-10 -6.841845e-10 4.352906e-10
Component weights 2 -5.137869e-10 1.900899e-09 9.721030e-10 1.872090e-09 -4.564939e-10
Component weights 3 -7.915763e-10 9.721030e-10 3.363203e-09 3.988530e-09 9.450517e-10
Component weights 4 -6.841845e-10 1.872090e-09 3.988530e-09 1.277432e-08 -2.272119e-09
Component weights 5 4.352906e-10 -4.564939e-10 9.450517e-10 -2.272119e-09 7.960307e-09
以上 df 是使用 n_components = 5 创建的 PCA 示例
【问题讨论】:
-
你能举例说明最终的 DataFrame 应该是什么样子吗?另外,
pca.components_的尺寸是多少? -
@ogdenkev 我已经相应地修改了上面的代码。
标签: python pandas machine-learning pca