创建一个表，显示列表是否包含在数据框的分组列中答案

【问题标题】：Create a table displaying whether a list is contained within the grouped columns of dataframe创建一个表，显示列表是否包含在数据框的分组列中
【发布时间】：2020-08-15 11:35:22
【问题描述】：

给定以下数据：

data = pd.DataFrame(
    dict(
        source=["file1", "file1", "file2", "file2"],
        variable=["shipp", "carrr", "shipp", "bikee"],
    )
)
vals = pd.Series(["ship", "bike"])

看起来像：

  source variable
0  file1    shipp
1  file1    carrr
2  file2    shipp
3  file2    bikee

我想创建以下内容：

          ship     bike
file1     True     False
file2     True     True

虽然我不知道该怎么做，但我已经尝试了以下方法：

data.groupby("source").apply(
    lambda grp: pd.Series([any(grp["variable"].str.contains(v)) for v in vals])
)

这花了我几次，我现在想知道是否有更好的方法。

（欢迎任何帮助编写更好的标题）

【问题讨论】：

标签： python pandas pandas-groupby data-manipulation

【解决方案1】：

我们先extract 然后pd.crosstab

data['new']=data.variable.str.extract('('+'|'.join(vals)+')')[0]
s=pd.crosstab(data.source,data.new).astype(bool)
new      bike  ship
source             
file1   False  True
file2    True  True

【讨论】：

谢谢 - 关于这个解决方案的一件事是它似乎依赖于我改变原始数据帧（通过添加 new 列）？
@baxx pd.crosstab(data.source,data.variable.str.extract('('+'|'.join(vals)+')')[0]).astype(布尔）
出于兴趣 - [0] 似乎不是必需的，添加它有什么原因吗？
@baxx 尝试获取系列，我在分配新列时总是使用系列