用基于 dict 的条目数组替换 Pandas DataFrame 列中的字符串答案

【问题标题】：Replacing strings in Pandas DataFrame column with array of entries based on dict用基于 dict 的条目数组替换 Pandas DataFrame 列中的字符串
【发布时间】：2017-11-01 18:22:14
【问题描述】：

我有一个 DataFrame，例如：

     tag1   other
0    a,c      foo
1    b,c      foo
2    d        foo
3    a,a      foo

其中的条目是用逗号分隔的字符串。

以及每个标签的定义字典，例如：

dict = {'a' : 'Apple',
'b' : 'Banana',
'c' : 'Carrot'}

我想替换a、b 和c 的定义，但删除该字典中没有内容的行（即d）。此外，我想确保没有重复，例如示例数据集中的行索引 3。

到目前为止我所拥有的：

df.tags = df.tags.str.split(',')
for index, row in df.iterrows():
    names = []
    for tag in row.tag1:
            if tag == dict[tag]:
                names.append(dict[tag])
            else:
                 df.drop(df.index[index])

从那里我将用names 中的值替换原始列。为了替换重复项，我正在考虑遍历数组并检查下一个值是否与下一个匹配，如果是，则删除它。但是，这不起作用，我有点难过。所需的输出看起来像（使用 unicode 字符串）：

     tag1                     other
0    ['Apple', 'Carrot']      foo
1    ['Banadn', 'Carrot']     foo
3    ['Apple']                foo

【问题讨论】：

想要的输出是什么样的？
我已经编辑了，谢谢。

标签： python arrays pandas dictionary

【解决方案1】：

为了我参加最长的单轮比赛

m = {
    'a' : 'Apple',
    'b' : 'Banana',
    'c' : 'Carrot'
}

df.tag1.str.split(',', expand=True) \ 
  .stack().map(m).groupby(level=0) \
  .filter(lambda x: x.notnull().all()) \
  .groupby(level=0).apply(lambda x: x.drop_duplicates().str.cat(sep=',')) \
  .to_frame('tag1').join(df.other)

            tag1 other
0   Apple,Carrot   foo
1  Banana,Carrot   foo
3          Apple   foo

但说真的，可能是更好的解决方案

a = np.core.defchararray.split(df.tag1.values.astype(str), ',')
lens = [len(s) for s in a]
b = np.concatenate(a)
c = [m.get(k, np.nan) for k in b]
i = df.index.values.repeat(lens)
s = pd.Series(c, i)

def proc(x):
    if x.notnull().all():
        return x.drop_duplicates().str.cat(sep=',')

s.groupby(level=0).apply(proc).dropna().to_frame('tag1').join(df.other)

            tag1 other
0   Apple,Carrot   foo
1  Banana,Carrot   foo
3          Apple   foo

【讨论】：

现在，这是一些令人兴奋的水果！
@DmitryPolonskiy 如果高分获胜......那么当然:-)
我把它打印出来放在我的工作台上以珍惜它。谢谢！
看来我在tags1 的每个条目中只能得到一个定义，即使它包含许多标签。
@Kam 是您所代表的定义吗？或者它们是否包含整数？