如何从熊猫数据框中删除方括号答案

【问题标题】：How to remove square bracket from pandas dataframe如何从熊猫数据框中删除方括号
【发布时间】：2016-11-03 23:39:48
【问题描述】：

在将str.findall() 应用于熊猫数据框的列之后，我在方括号中找到了值（更像是list）。如何删除方括号？

print df

id     value                 
1      [63]        
2      [65]       
3      [64]        
4      [53]       
5      [13]      
6      [34]

【问题讨论】：

该列的内容是什么，这是一个字符串'[63]' 还是一个列表[63]？

标签： python string pandas dataframe

【解决方案1】：

从数据框 string 列中删除 [ 和 ] 字符的一般解决方案是

df['value'] = df['value'].str.replace(r'[][]', '', regex=True)  # one by one
df['value'] = df['value'].str.replace(r'[][]+', '', regex=True) # by chunks of one or more [ or ] chars

[][] 是正则表达式中的 character class，与 ] 或 [ 字符匹配。 + 使正则表达式引擎按顺序匹配这些字符一次或多次。

请参阅regex demo。

然而，在这种情况下，方括号标记了由Series.str.findall 生成的字符串列表。很明显，您想从列值中提取第一个匹配项 one。

当你需要第一个匹配时，使用Series.str.extract
当你需要所有匹配时，使用Series.str.findall

因此，在这种情况下，为了避免您遇到的这种麻烦，您可以使用

df['value'] = df['source_column'].str.extract(r'my regex with one set of (parentheses)')

请注意，str.extract 至少需要一组捕获括号才能实际工作并返回一个值（str.findall 即使没有 capturing group 也可以工作）。

请注意，如果您要使用findall 获得多个匹配项，并且您想要一个字符串作为输出，您可以str.join 匹配项：

df['value'] = df['source_column'].str.findall(pattern).str.join(', ')

【讨论】：

【解决方案2】：

如果是字符串我们也可以使用string.replace方法

import pandas as pd

df =pd.DataFrame({'value':['[63]','[65]','[64]']})

print(df)
  value
0  [63]
1  [65]
2  [64]

df['value'] =  df['value'].apply(lambda x: x.replace('[','').replace(']','')) 

#convert the string columns to int
df['value'] = df['value'].astype(int)

#output
print(df)

   value
0     63
1     65
2     64

print(df.dtypes)
value    int32
dtype: object

【讨论】：

【解决方案3】：

如果value 列中的值具有list 类型，请使用：

df['value'] = df['value'].str[0]

或者：

df['value'] = df['value'].str.get(0)

Docs.

示例：

df = pd.DataFrame({'value':[[63],[65],[64]]})
print (df)
  value
0  [63]
1  [65]
2  [64]

#check type if index 0 exist
print (type(df.loc[0, 'value']))
<class 'list'>

#check type generally, index can be `DatetimeIndex`, `FloatIndex`...
print (type(df.loc[df.index[0], 'value']))
<class 'list'>

df['value'] = df['value'].str.get(0)
print (df)
   value
0     63
1     65
2     64

如果strings 使用str.strip 然后通过astype 转换为数字：

df['value'] = df['value'].str.strip('[]').astype(int)

示例：

df = pd.DataFrame({'value':['[63]','[65]','[64]']})
print (df)
  value
0  [63]
1  [65]
2  [64]

#check type if index 0 exist
print (type(df.loc[0, 'value']))
<class 'str'>

#check type generally, index can be `DatetimeIndex`, `FloatIndex`...
print (type(df.loc[df.index[0], 'value']))
<class 'str'>


df['value'] = df['value'].str.strip('[]').astype(int)
print (df)
  value
0    63
1    65
2    64

【讨论】：

df['value'].dtype 给了dtype('O')
还有什么type(df.ix[0, 'value'])？
是否有可能得到dtype: float64的结果？
@separ1 - 是的。 df['value'].str.get(0) 或 df['value'].str[0] 表示给出列表的第一个值。如果需要al值，需要df1 = pd.DataFrame(df['value'].values.tolist())
当我有[63, 23]（列表中有2个值）而不是[63]时该怎么办？