在 pandas 中具有不同索引的 2 个数据帧之间应用 func答案

【问题标题】：Apply func between 2 dataframes with different indexes in pandas在 pandas 中具有不同索引的 2 个数据帧之间应用 func
【发布时间】：2018-03-26 17:56:40
【问题描述】：

我有 2 个数据框，每个数据框都有 dtype int64 的列 “count” 和索引 “product_id”，所以我想在两个数据框的索引之间使用手工公式来表示“count “ 列。我知道我可以像数据框 "substraction" 那样做，但找不到如何在数据框的列之间使用手工制作的函数。顺便说一句，行和索引的数量不完全匹配。我只需要对相同的索引使用函数。

这是两个数据框的示例

df2_count[['count']].head()

    count
product_id  
    9014    41
    8458    11
    55522   9
    6969    8
    8840    7


df1_count[['count']].head()

        count
product_id  
    7545    12
    8866    10
    8867    10
    47196   6
    9014    5

这就是我试图做的。当我没有找到我需要的方法时-> 我尝试创建 NaN 示例 df，其中行和列分别是数据帧索引。然后遍历每一列的每一行，并用函数的结果填充 NaN 示例数据框，但它看起来很乱，很多 NaN，我什至不知道如何处理并使其看起来正常供人们阅读。

data_ibs = pd.DataFrame(index=df2_count.index,columns=df1_count.index)

def formula(a, b):
    if a > b:
        ans_inc = (a-b) / b * 100
        return ans_inc
    else: 
        ans_decr = (a-b) / a * 100
        return ans_decr

for i in range(0,len(df2_count.index)):
    for j in range(0,len(df1_count.index)):
        if df2_count.index[i] == df1_count.index[j]:
            a = df2_count.get_value(df2_count.index[i], 'count')
            b = df1_count.get_value(df1_count.index[j], 'count')
            data_ibs.ix[i,j] = formula(a, b)

output_csv = data_ibs.to_csv('output.csv')

有人可以帮助我如何更轻松、更“熊猫”地实现我需要的东西吗？感谢您的帮助

【问题讨论】：

标签： python-3.x pandas dataframe

【解决方案1】：

我只是以更优雅的方式（熊猫方式）做到了。我们的想法不是尝试在不同的数据帧之间应用 func ，而是将其合并为一个，然后使用简单的 pandas in-build apply 函数

计算您需要的内容

dff = pd.merge(df2_count, df1_count, how='outer', \
                    right_index=True, left_index=True, suffixes=('_x', '_y')).fillna(1) 
dff['mean'] = dff[['count_x', 'count_y']].mean(axis=1)
dff['sum'] = dff[['count_x', 'count_y']].sum(axis=1)
dff['count_percents'] = dff.apply(lambda row: change_percents(row['count_x'], row['count_y']), axis=1)

顺便说一句，您可以从数据框列表中创建一个数据框。刚刚附上了我使用的代码 --> 也很有帮助。

frames = []
for filename in os.listdir(path):
    if not filename.endswith('csv'): 
        continue
    logging.debug(filename)
    df = pd.read_csv(os.path.join(path, filename), index_col=None, names=['wallets'])
    frames.append(df)
    logging.debug(frames)

希望对某人有所帮助:)

【讨论】：