【问题标题】:How to iterate through columns of the dataframe?如何遍历数据框的列?
【发布时间】:2021-06-28 03:58:45
【问题描述】:

我想浏览数据框的所有列。这样我将获得该列的特定数据,使用这些数据我必须为另一个数据帧计算。 我有:

                         DP1         DP2         DP3         DP4         DP5         DP6         DP7         DP8        DP9       DP10       Total
OP1                  357848.0   1124788.0   1735330.0   2218270.0   2745596.0   3319994.0   3466336.0   3606286.0  3833515.0  3901463.0   3901463.0
OP2                  352118.0   1236139.0   2170033.0   3353322.0   3799067.0   4120063.0   4647867.0   4914039.0  5339085.0        NaN   5339085.0
OP3                  290507.0   1292306.0   2218525.0   3235179.0   3985995.0   4132918.0   4628910.0   4909315.0        NaN        NaN   4909315.0
OP4                  310608.0   1418858.0   2195047.0   3757447.0   4029929.0   4381982.0   4588268.0         NaN        NaN        NaN   4588268.0
OP5                  443160.0   1136350.0   2128333.0   2897821.0   3402672.0   3873311.0         NaN         NaN        NaN        NaN   3873311.0
OP6                  396132.0   1333217.0   2180715.0   2985752.0   3691712.0         NaN         NaN         NaN        NaN        NaN   3691712.0
OP7                  440832.0   1288463.0   2419861.0   3483130.0         NaN         NaN         NaN         NaN        NaN        NaN   3483130.0
OP8                  359480.0   1421128.0   2864498.0         NaN         NaN         NaN         NaN         NaN        NaN        NaN   2864498.0
OP9                  376686.0   1363294.0         NaN         NaN         NaN         NaN         NaN         NaN        NaN        NaN   1363294.0
OP10                 344014.0         NaN         NaN         NaN         NaN         NaN         NaN         NaN        NaN        NaN    344014.0
Total               3671385.0  11614543.0  17912342.0  21930921.0  21654971.0  19828268.0  17331381.0  13429640.0  9172600.0  3901463.0  34358090.0
Latest Observation   344014.0   1363294.0   2864498.0   3483130.0   3691712.0   3873311.0   4588268.0   4909315.0  5339085.0  3901463.0         NaN 

从这个表我想计算公式这个公式:在 DP1 列,总/最后一次观察中,这个答案是除以 DP2 列总数。像这样,我们必须计算所有列并将其保存在另一个数据框中。

我们需要这样的行:

Weighted Average     3.491   1.747   1.457   1.174   1.104   1.086   1.054   1.077   1.018 

我们尝试过的这段代码:

LDFTriangledf['Weighted Average'] =CumulativePaidTriangledf.loc['Total','DP2']/(CumulativePaidTriangledf.loc['Total','DP1'] - CumulativePaidTriangledf.loc['Latest Observation','DP1'])

【问题讨论】:

    标签: python pandas oracle dataframe triangle


    【解决方案1】:

    您可以从.loc 中删除列名,只删除shift(-1, axis=1) 以获取下一列的Total。这使您可以在一次操作中将公式应用于所有列:

    CumulativePaidTriangledf.shift(-1, axis=1).loc['Total'] / (CumulativePaidTriangledf.loc['Total'] - CumulativePaidTriangledf.loc['Latest Observation'])
    
    # DP1      3.490607
    # DP2      1.747333
    # DP3      1.457413
    # DP4      1.173852
    # DP5      1.103824
    # DP6      1.086269
    # DP7      1.053874
    # DP8      1.076555
    # DP9      1.017725
    # DP10          inf
    # Total         NaN
    # dtype: float64
    

    以下是这三个组件的功能细分:

    DP1 DP2 DP3 DP4 DP5 DP6 DP7 DP8 DP9 DP10 Total
    A: .shift(-1, axis=1).loc['Total'] -- We are shifting the whole Total row to the left, so every column now has the next Total value. 1.161454e+07 1.791234e+07 2.193092e+07 2.165497e+07 1.982827e+07 1.733138e+07 1.342964e+07 9.172600e+06 3.901463e+06 34358090.0 NaN
    B: .loc['Total'] -- This is the normal Total row. 3.671385e+06 1.161454e+07 1.791234e+07 2.193092e+07 2.165497e+07 1.982827e+07 1.733138e+07 1.342964e+07 9.172600e+06 3901463.0 34358090.0
    C: .loc['Latest Observation'] -- This is the normal Latest Observation. 3.440140e+05 1.363294e+06 2.864498e+06 3.483130e+06 3.691712e+06 3.873311e+06 4.588268e+06 4.909315e+06 5.339085e+06 3901463.0 NaN
    A / (B-C) -- This is what the code above does. It takes the shifted Total row (A) and divides it by the difference of the current Total row (B) and current Latest observation row (C). 3.490607 1.747333 1.457413 1.173852 1.103824 1.086269 1.053874 1.076555 1.017725 inf NaN

    【讨论】:

    • 如何使用 for 循环处理所有列。
    • 不需要循环。我发布的代码在 1 次操作中计算所有列。这是 pandas 中推荐的方法,而不是循环。
    • @Devil 我更新了答案,对其工作原理进行了更多解释/可视化。
    • https://stackoverflow.com/q/66948793/15519479 请在这里帮助我@tdy
    猜你喜欢
    • 2020-09-11
    • 1970-01-01
    • 2020-01-19
    • 1970-01-01
    • 2017-10-05
    • 1970-01-01
    • 2016-11-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多