【问题标题】:Iterate over DataFrame Date Groups in order, with reference to previous group参考前一组,按顺序迭代 DataFrame 日期组
【发布时间】:2021-05-16 12:33:33
【问题描述】:

我有一个 MultiIndex (Name, Date) DataFrame df,我需要通过 Date 对其进行迭代处理,以便分配一个基于当前和前一个日期组的值。

AFAIK 处理 DataFrame 组的最佳方式是通过 .apply - 例如,df.groupby('Date').apply(ifunc)

但是当ifunc 需要在ifunc 处理了前一个组之后引用前一个日期组的值时,我怎样才能最好地做到这一点

下面是这样一个ifunc 的示例,用于在df 上使用列['Dollars', 'Weight', 'Return', 'HaveMax']

# (This might not be great python; coding improvements welcome!)
# Lambda to add "AddDollars" to Names that don't already "HaveMax" "MaxDollars"
def ifunc(group, previous):  # Arguments are df groups by Date
    group['HaveMax'] = previous['HaveMax']
    # Each Name's Dollars changed from the previous Date
    avgWeights = group['Weight'].mean()
    group['Dollars'] = group['Weight'] * previous['Dollars'] * group['Return'] / avgWeights
    # Now add "AddDollars" to Names that were under
    group.loc[group['HaveMax'] == False, 'Dollars'] = group[group['HaveMax'] == False]['Dollars'] + AddDollars
    # Update HaveMax for any Names that reached MaxDollars on this Date
    group.loc[group['HaveMax'] == False, 'HaveMax'] = group[group['HaveMax'] == False]['Dollars'] >= MaxDollars
    return group

样本数据:

AddDollars = 1.0
MaxDollars = 10.0
df = pd.DataFrame(data=[('A', '20210101', 9.0, 1.0, 0, False),
                        ('B', '20210101', 5.0, 1.0, 0, False),
                        ('C', '20210101', 5.0, 1.0, 0, True),
                        ('A', '20210102', 0.0, 1.0, 1.0, False),
                        ('B', '20210102', 0.0, 1.0, 1.0, False),
                        ('C', '20210102', 0.0, 1.0, 1.0, False)],
                  columns=('Name', 'Date', 'Dollars', 'Weight', 'Return', 'HaveMax')).set_index(['Name', 'Date'])

期望的输出:

               Dollars  Weight  Return  HaveMax
Name Date                                      
A    20210101      9.0     1.0    0.0    False
B    20210101      5.0     1.0    0.0    False
C    20210101      5.0     1.0    0.0     True
A    20210102     10.0     1.0    1.0     True
B    20210102      6.0     1.0    1.0    False
C    20210102      5.0     1.0    1.0     True

【问题讨论】:

  • 很可能你可以玩弄懒惰的 groupby。您绝对应该添加一些示例数据和预期输出。
  • @QuangHoang 我刚刚添加了示例数据和预期输出。 “懒惰的groupby”指的是什么? DataFrame.GroupBy 是否保证 .apply 按索引顺序排列?或者我什至不应该使用 .apply ,因为它可能会并行化并且无法保证计算顺序?

标签: python pandas dataframe lambda iteration


【解决方案1】:

使用groupby 遍历组。

AddDollars = 1.0
MaxDollars = 10.0
df = pd.DataFrame(data=[('A', '20210101', 9.0, 1.0, 0, False),
                        ('B', '20210101', 5.0, 1.0, 0, False),
                        ('C', '20210101', 5.0, 1.0, 0, True),
                        ('A', '20210102', 0.0, 1.0, 1.0, False),
                        ('B', '20210102', 0.0, 1.0, 1.0, False),
                        ('C', '20210102', 0.0, 1.0, 1.0, False)],
                  columns=('Name', 'Date', 'Dollars', 'Weight', 'Return', 'HaveMax')).set_index(['Name', 'Date'])

dft = df.groupby(df.index.get_level_values('Date'))
groupings = list(dft.groups.keys())
previous = dft.get_group(groupings[0])
for i, gkey in enumerate(groupings[1:], 1):
    group = dft.get_group(gkey)
    group['HaveMax'] = previous['HaveMax'].values
    avgWeights = group['Weight'].mean()
    group['Dollars'] = group['Weight'].values * previous['Dollars'].values * group['Return'].values / avgWeights
    group.loc[group['HaveMax'] == False, 'Dollars'] = group[group['HaveMax'] == False]['Dollars'] + AddDollars
    group.loc[group['HaveMax'] == False, 'HaveMax'] = group[group['HaveMax'] == False]['Dollars'] >= MaxDollars
    # Assign the calculated values back to the DataFrame:
    df.loc[group.index] = group
    # Prepare for next iteration:
    previous = group

【讨论】:

  • 优秀。我能够将您的草稿重新编入循环并在当前编辑中分配表单。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-04-07
  • 1970-01-01
  • 2022-06-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多