用逗号分割子串并用每个子串分隔行答案

【问题标题】：Splitting up substrings by comma and separate rows by each substring用逗号分割子串并用每个子串分隔行
【发布时间】：2020-12-15 08:59:04
【问题描述】：

我想在这里寻求您的帮助。我有一个带有“标签”列的数据框，该列有多个用逗号分隔的子字符串。我想将子字符串拆分为逗号并根据子字符串复制行。下面是一个示例操作。

样本df

   A          B          C          D          E                   Tag
A mug      computer    stack      code       phone        labor relation, m&a, h&s
google     virjoy      plant      ivan       wrong          business, environment
gazette   nowhere     conquer    jermo       chris             business ethics
spray      hilda      square     walk      nonsense        m&a, hiring and expansion
Florence    plug     nihilist    font       hello     h&s, wages and hours, product recall

输出df

   A        B         C       D       E                   Tag                              New Tag
A mug    computer   stack    code   phone       labor relation, m&a, h&s               labor relation
A mug    computer   stack    code   phone       labor relation, m&a, h&s                     m&a
A mug    computer   stack    code   phone       labor relation, m&a, h&s                     h&s
google    virjoy    plant    ivan   wrong        business, environment                     business
google    virjoy    plant    ivan   wrong        business, environment                    environment
gazette  nowhere   conquer   jermo  chris          business ethics                     business ethics             
spray     hilda    square    walk  nonsense     m&a, hiring and expansion                    m&a
spray     hilda    square    walk  nonsense     m&a, hiring and expansion            hiring and expansion
Florence  plug    nihilist   font   hello    h&s, wages and hours, product recall            h&s
Florence  plug    nihilist   font   hello    h&s, wages and hours, product recall     wages and hours
Florence  plug    nihilist   font   hello    h&s, wages and hours, product recall    product recall

我正在考虑通过'，'进行拆分操作，然后可能会像融化？将不胜感激任何帮助！非常感谢提前！

【问题讨论】：

标签： python pandas split

【解决方案1】：

创建所需数据格式的函数

def transform(df, col_list, fill_value='', preserve_index=False):

    if (col_list is not None and len(col_list) > 0 and not isinstance(col_list, (list, tuple, np.ndarray, pd.Series))):
        col_list = [col_list]

    v2_cols = df.columns.difference(col_list)
    lens = df[col_list[0]].str.len()    
    v2 = np.repeat(df.index.values, lens)

    final = (pd.DataFrame({
            col:np.repeat(df[col].values, lens)
            for col in v2_cols},
            index=v2)
         .assign(**{col:np.concatenate(df.loc[lens>0, col].values)
                        for col in col_list}))

    if (lens == 0).any():
        final = (final.append(df.loc[lens==0, v2_cols], sort=False).fillna(fill_value))

    final = final.sort_index()

    if not preserve_index:        
        final = final.reset_index(drop=True)
    return final

像这样调用这个函数

transform(df.assign(Tag=df.Tag.str.split(',')), 'Tag')

【讨论】：