根据历史数据进行预测答案

【问题标题】：Forecasting basis the historical figures根据历史数据进行预测
【发布时间】：2023-03-05 23:29:01
【问题描述】：

我想根据历史数据预测分配。

用户提供的手动输入：

year    month     x          y          z          k
2018    JAN  9,267,581   627,129     254,110     14,980 
2018    FEB  7,771,691   738,041     217,027     17,363

历史人物的输出：

year  month segment pg  is_p    x   y   z   k
2018    JAN A   p   Y   600 600 600 600
2018    JAN A   p   N   200 200 200 200
2018    JAN B   r   Y   400 400 400 400
2018    JAN A   r   Y   400 400 400 400
2018    JAN A   r   N   400 400 400 400
2018    JAN B   r   N   300 300 300 300
2018    JAN C   s   Y   200 200 200 200
2018    JAN C   s   N   10  10  10  10
2018    JAN C   t   Y   11  11  11  11
2018    JAN C   t   N   12  12  12  12
2018    FEB A   p   Y   789 789 789 789
2018    FEB A   p   N   2093874 2093874 2093874 2093874

我尝试从总数中计算is_p 的分配，比如我添加某些列来计算分配的百分比：

%ofx_segment= 600+200+400+400/600+200+400+400+400+300+200+10+11+12。这将告诉我从段中贡献了多少 x y,z,k 也是如此
我将手动输入的 9276581 * %ofx_segment 乘以计算 segment_x 的值
然后，我计算%_pg。对于 2018 年 1 月的分段 A，%_pg= 600+200/600+200+400+400
然后，我将从步骤 2 收到的手动输入乘以从 3 收到的 %pg for 'p' in pg for A 段
然后，最后，我将计算 is_p 的 %，我将计算 % Y 或 %N 对于 p in pg 对于 A 在段 % Y 中是 =600/600+200。
从第 5 步收到的值必须乘以从第 4 步收到的输出。

import pandas as pd
first=pd.read_csv('/Users/arork/Downloads/first.csv')
second=pd.read_csv('/Users/arork/Downloads/second.csv')
interested_columns=['x','y','z','k']
second=pd.read_csv('/Users/arork/Downloads/second.csv')
interested_columns=['x','y','z','k']
primeallocation=first.groupby(['year','month','pg','segment'])[['is_p']+interested_columns].apply(f)
segmentallocation=first.groupby(['year','month'])[['segment']+interested_columns].apply(g)
pgallocation=first.groupby(['year','month','segment'])[['pg']+interested_columns].apply(h)
segmentallocation['%of allocation_segment x']
np.array(second)
func = lambda x: x * np.asarray(second['x'])
segmentallocation['%of allocation_segment x'].apply(func)

【问题讨论】：

@AILearning：欢迎评论
告诉我们您尝试了什么以及错误是什么。还要添加预期的输出。
interested_columns={'x','y','z','p','q','r'}
我从获取 segment_allocation 的分配开始。
@AI_Learning- 在这里，我无法弄清楚如何将分配的百分比与为 x、y、z、k 给出的手动输入相乘

标签： python pandas forecasting

【解决方案1】：

您需要连接这两个数据框以执行两列的乘法运算。

merged_df = segmentallocation.merge(second,on=['year','month'],how='left',suffixes=['','_second'])

for c in interested_columns:
        merged_df['allocation'+str(c)] = merged_df['%of allocation'+str(c)] * merged_df[c] 

merged_df


    year    month   segment x   y   z   k   %of allocationx %of allocationy %of allocationz %of allocationk x_second    y_second    z_second    k_second    allocationx allocationy allocationz allocationk
0   2018    FEB A   2094663 2094663 2094663 2094663 1.000000    1.000000    1.000000    1.000000    7,771,691   738,041 217,027 17,363  2.094663e+06    2.094663e+06    2.094663e+06    2.094663e+06
1   2018    JAN A   1600    1600    1600    1600    0.631662    0.631662    0.631662    0.631662    9,267,581   627,129 254,110 14,980  1.010659e+03    1.010659e+03    1.010659e+03    1.010659e+03
2   2018    JAN B   700 700 700 700 0.276352    0.276352    0.276352    0.276352    9,267,581   627,129 254,110 14,980  1.934465e+02    1.934465e+02    1.934465e+02    1.934465e+02
3   2018    JAN C   233 233 233 233 0.091986    0.091986    0.091986    0.091986    9,267,581   627,129 254,110 14,980  2.143269e+01    2.143269e+01    2.143269e+01    2.143269e+01

【讨论】：

您需要定义问题中的所有术语。你的功能是什么g,h？我只是通过猜测这些功能来给出答案。
顺便说一下，我们在is_p上分组，我想得到段的总和，然后是素数
Segment进一步划分为pg，pg又进一步划分为p