合并数据帧时是否可以只使用一定次数的行？答案

【问题标题】：Is it possible to use rows only a certain number of times when merging dataframes?合并数据帧时是否可以只使用一定次数的行？
【发布时间】：2021-10-26 12:02:56
【问题描述】：

我的数据如下所示：

dfA:    dfB:

type    type | name | n
------  ----------------
   A       A | 123  | 1
   B       B | 123  | 1
   A       A | 456  | 1
   B       B | 789  | 1
   A

n 列给出了dfB 的元素可以添加到dfA 的次数。

是否可以在type 上“合并”（或使用其他熊猫函数）dfB 到 dfA，以便我的结果不包括来自 dfB 的命名行超过 n 次?应该使用dfA 的序列来确定哪一行在前。所以在这种情况下：

desired result:

type | name
----------------
   A | 123 
   B | 789 ------> the second row "123" does not get added since it is already n=1 times
   A | 456         in the resulting data. The row with name="789" is added instead.
   B | NO MATCH -> There are no more rows fitting the Criteria "type = B"
   A | NO MATCH -> There are no more rows fitting the Criterua "type = A"

编辑： dfA中的type列与dfB中的列不一样，所以无法提前删除dfB中的数据。考虑dfA 的这个变体（dfB 保持不变）：

dfA:     dfB:                result:            

type     type | name | n     type | name        
-----    ----------------    -----------        
   B        A | 123  | 1        B | 123            
   A        B | 123  | 1        A | 456            
   B        A | 456  | 1        B | 789           
   A        B | 789  | 1        A | NO MATCH      
   B                            B | NO MATCH

【问题讨论】：

逻辑不完全清楚，是不是因为有两次123而忽略了第二个B？
我编辑了问题以包含更多逻辑，希望对您有所帮助。
那我的回答是你想要的吗？真的需要dfA吗？
抱歉不清楚。 dfA 与 dfB 中的“类型”列不同，请参阅新的编辑。

标签： python pandas merge

【解决方案1】：

你想要什么并不完全清楚，但假设你希望 name 出现不超过 n 次，你可以这样做：

dfB.assign(name=dfB['name'].where(dfB.groupby('name').cumcount().lt(dfB['n'])))[['type', 'name']]

输出：

  type   name
0    A  123.0
1    B    NaN
2    A  456.0
3    B  789.0

你期望的合并操作也不清楚，但是一旦你有了上面的dataframe，你可以根据你的要求join或者merge。

【讨论】：