将多个值拆分为新行答案

【问题标题】：Split Multiple Values into New Rows将多个值拆分为新行
【发布时间】：2018-10-12 19:24:22
【问题描述】：

我有一个数据框，其中几列可能在单个观察中具有多个值。这些行中的每个观察在观察的末尾都有一个“/”，无论是否有多个。这意味着一些值看起来像这样：'OneThing/' 而其他值看起来像这样：'OneThing/AnotherThing/'

我需要在观察中获取多个值的值并将它们拆分为单独的行。

这是数据框以前外观的一般示例：

ID  Date   Name ColA   ColB   Col_of_Int                        ColC   ColD
1   09/12  Ann  String String OneThing/                         String String
2   09/13  Pete String String OneThing/AnotherThing             String String
3   09/13  Ann  String String OneThing/AnotherThing/ThirdThing/ String String
4   09/12  Pete String String OneThing/                         String String

我想要的输出是什么：

ID  Date   Name ColA   ColB   Col_of_Int                        ColC   ColD
1   09/12  Ann  String String OneThing                         String String
2   09/13  Pete String String OneThing                         String String
2   09/13  Pete String String Another Thing                    String String
3   09/13  Ann  String String OneThing                         String String
3   09/13  Ann  String String AnotherThing                     String String
3   09/13  Ann  String String ThirdThing                       String String
4   09/12  Pete String String OneThing/                        String String

我尝试了以下方法：

df = df[df['Column1'].str.contains('/')]
df_split = df[df['Column1'].str.contains('/')]
df1 = df_split.copy()
df2 = df_split.copy()

split_cols = ['Column1']

for c in split_cols:
    df1[c] = df1[c].apply(lambda x: x.split('/')[0])
    df2[c] = df2[c].apply(lambda x: x.split('/')[1])

new_rows = df1.append(df2)
df.drop(df_split.index, inplace=True)
df = df.append(new_rows, ignore_index=True)

这行得通，但我认为它会在每个“/”之后创建新行，这意味着一个正在为每个只有一个值的观察创建新行（我想要零个新行），并且为每个具有两个值（只需要一个）的观察创建两个新行，等等。

当观察中有三个或更多值时，这尤其令人沮丧，因为我得到了几个不必要的行。

有什么办法可以解决这个问题，以便只有不止一个的观察被添加到新行中？

【问题讨论】：

如果你的df = pd.DataFrame({'Column1': ['OneThing/', 'TwoThing/AnotherThing/']}) ，你能给出预期的输出吗？
@Ben.T 添加在上面！

标签： python python-3.x pandas split append

【解决方案1】：

如果您使用df['column_of_interest'] = df['column_of_interest'].str.rstrip('/')，您的方法会起作用（我认为），因为它会在您的观察结束时摆脱烦人的/。但是，循环是无效的，并且您拥有它的方式要求您知道在您的列中最多有多少观察值。这是另一种方式，我认为可以满足您的需求：

以df为例：

df = pd.DataFrame({'column_of_interest':['onething/', 
                                         'onething/twothings/', 
                                         'onething/twothings/threethings/'], 
                   'values1': [1,2,3], 
                   'values2': [5,6,7]})

>>> df
                column_of_interest  values1  values2
0                        onething/        1        5
1              onething/twothings/        2        6
2  onething/twothings/threethings/        3        7

这有点混乱，因为您可能希望保留column_of_interest 之外的列中的数据。因此，您可以使用以下方法临时找到它们并将它们放在一边：

value_columns = [i for i in df.columns if i != 'column_of_interest']

并将它们放入索引中以进行以下操作（最后恢复它们）：

new_df = (df.set_index(value_columns)
          .column_of_interest.str.rstrip('/')
          .str.split('/')
          .apply(pd.Series)
          .stack()
          .rename('new_column_of_interest')
          .reset_index(value_columns))

然后你的new_df 看起来像：

>>> new_df
   values1  values2 new_column_of_interest
0        1        5               onething
0        2        6               onething
1        2        6              twothings
0        3        7               onething
1        3        7              twothings
2        3        7            threethings

或者，使用merge：

new_df = (df[value_columns].merge(df.column_of_interest
                        .str.rstrip('/')
                        .str.split('/')
                        .apply(pd.Series)
                        .stack()
                        .reset_index(1, drop=True)
                        .to_frame('new_column_of_interest'),
                        left_index=True, right_index=True))

编辑：在您发布的数据框上，这会导致：

   ID   Date  Name    ColA    ColB    ColC    ColD new_column_of_interest
0   1  09/12   Ann  String  String  String  String               OneThing
0   2  09/13  Pete  String  String  String  String               OneThing
1   2  09/13  Pete  String  String  String  String           AnotherThing
0   3  09/13   Ann  String  String  String  String               OneThing
1   3  09/13   Ann  String  String  String  String           AnotherThing
2   3  09/13   Ann  String  String  String  String             ThirdThing
0   4  09/12  Pete  String  String  String  String               OneThing

【讨论】：

这效果好多了！我仍然得到额外的列，但我相信我可以删除它们。
哦，我的错。我的意思是多一排。每行都有一个副本。这意味着数据框中的每一行都与上面的行相同，但在该复制行中感兴趣的列是空白的。这有意义吗？
我想我明白你在说什么，但是当我运行它时，我也没有得到额外的行（请参阅上面编辑中的输出）
你得到的是我想要的，但我得到了不同的东西。在上面添加，但可能需要几分钟，因为它需要经过同行评审。
抱歉！新来的。我无法弄清楚它们为什么不同，但是一旦删除具有空值的行，我得到的结果与您相同。