具有公共密钥时如何用另一个数据帧填充数据帧中的缺失数据答案

【问题标题】：How to fill missing data from a dataframe with another dataframe when having a common key具有公共密钥时如何用另一个数据帧填充数据帧中的缺失数据
【发布时间】：2021-07-15 06:27:06
【问题描述】：

我有两个数据框。作为示例，请参见下文。当具有相同的 ProductID 时，如何使用来自 dfB 的相同值填充 df[GrossRate]== 0

基本上我在 df 中的 GrossRate 应该是 150 40 238 32

dataA = {'date': ['20210101','20210102','20210103','20210104'],
        'quanitity': [22000,25000,27000,35000],
        'NetRate': ['nan','nan','nan','nan'],
        'GrossRate': [150,0,238,0],
        'ProductID': [9613,7974,1714,5302],
        }

df = pd.DataFrame(dataA, columns = ['date', 'quanitity', 'NetRate', 'GrossRate','ProductID' ])

    date  quanitity NetRate  GrossRate  ProductID
0  20210101      22000     nan        150       9613
1  20210102      25000     nan          0       7974
2  20210103      27000     nan        238       1714
3  20210104      35000     nan          0       5302

dataB = {
        'ProductID': ['9613.T','7974.T','1714.T','5302.T'],
         'GrossRate': [10,40,28,32],
        }

dfB = pd.DataFrame(dataB, columns = ['ProductID', 'GrossRate' ])
dfB.ProductID = dfB.ProductID.str.replace('.T','')

print (dfB)

  ProductID  GrossRate
0      9613         10
1      7974         40
2      1714         28
3      5302         32

【问题讨论】：

DataFrame 中的行数相同，roductID 中的值顺序相同？

标签： python pandas conditional-statements missing-data

【解决方案1】：

试试这个列表理解：

df['GrossRate'] = [x if x != 0 else y for x, y in zip(df['GrossRate'], dfB['GrossRate'])]

【讨论】：

在 y 和 df[.... 之间需要逗号，因为我遇到了错误。谢谢
@Ksaman 编辑了我的代码，很抱歉，现在应该可以使用了，

【解决方案2】：

如果ProductID列中相同的行数和相同的顺序不需要ProductID匹配，那么使用numpy.where：

df['GrossRate'] = np.where(df['GrossRate'] == 0, dfB['GrossRate'], df['GrossRate'])

print (df)
       date  quanitity NetRate  GrossRate  ProductID
0  20210101      22000     nan        150       9613
1  20210102      25000     nan         40       7974
2  20210103      27000     nan        238       1714
3  20210104      35000     nan         32       5302

如果需要ProductID匹配，请使用：

dfB.ProductID = dfB.ProductID.str.replace('.T','').astype(int)

df['GrossRate'] = (np.where(df['GrossRate'] == 0, 
                            df['ProductID'].map(dfB.set_index('ProductID')['GrossRate']),
                            df['GrossRate']))

【讨论】：