python pandas - 将一列元组转换为字符串列答案

【问题标题】：python pandas - Convert a column of tuples to string columnpython pandas - 将一列元组转换为字符串列
【发布时间】：2020-12-22 12:54:31
【问题描述】：

这应该是一个比较简单的问题。

以下是我的df 专栏的示例：

             title2
1      (, 2 ct, , )
2      (, 1 ct, , )
3      (, 2 ct, , )
4               NaN
5      (, 2 ct, , )
6     (, 5 ct, , )
7  (, 7 ounce, , )
8    (, 1 gal, , )
9              NaN
10             NaN

我想将整个列转换为正确的字符串列 - 即我想要的输出是：

    title2
1      2ct
2      1ct
3      2ct
4      NaN
5      2ct
6      5ct
7  7 ounce
8     1gal
9      NaN
10     NaN

我尝试了以下命令，但似乎没有一个有效：

title['title3'] = title['title2'].agg(' '.join)
title['title3'] = title['title2'].apply(lambda x: ''.join(x))
title['title3'] = title['title2'].astype(str)
title['title3'] = title['title2'].values.astype(str)

这篇文章中给出的答案：Convert a pandas column containing tuples to string，不幸的是也对我没有帮助。

有人可以对此有所了解吗？谢谢大家。

【问题讨论】：

df['title2'].str.join(' ').str.strip() ?
这些“元组”是否在列单元格中保存为字符串？
@shubhamSharma 你的工作！我有一种感觉，这会比我预期的要简单得多。
无论如何，谢谢你们的帮助。
一个简单的正则表达式有什么问题？ df['title2'].replace('[(,\s+,)]','',regex=True)

标签： python pandas string dataframe tuples

【解决方案1】：

这样就可以了

demo_data['title2'] = demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r"\', \'", ",")
demo_data['title2']= demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r" ", "")

这给了。

   ID  title2
0   1     2ct
1   2     1ct
2   3     2ct
3   4     nan
4   5     2ct
5   6     5ct
6   7  7ounce
7   8    1gal
8   9     nan
9  10     nan

【讨论】：

【解决方案2】：

使用正则表达式：

import re

df['title3'] = df['title2'].apply(lambda x: re.sub('[^A-Za-z0-9]', '', str(x)))

【讨论】：

【解决方案3】：

试试下面的。我假设元组和 Nans 在您的列中保存为字符串，如果不让我知道，以便我调整解决方案：

def clear(x):
    if x=='Nan':
        return 'Nan'
    else:
        l=str(x)
        l=[i.strip() for i in l.split(',')]
        return [i for i in l if any(k in ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9') for k in i)][0]

df['title2']=df['title2'].apply(lambda x: clear(x))

【讨论】：

Megale，不幸的是你上面的代码给了我类似("", ("",("".....
现在它给了我错误list index out of range。不过，我认为这不是您的错，我认为这是因为我只发布了大约 7k 行列中的 10 行样本......所以可能还有其他东西需要考虑。尽管如此，上面 cmets 中的简单代码给了我我正在寻找的答案，所以不用担心这个。再次感谢。