从 Pandas 中的字符串中提取 int答案

【问题标题】：Extract int from string in Pandas从 Pandas 中的字符串中提取 int
【发布时间】：2016-05-24 09:59:42
【问题描述】：

假设我有一个数据框df

A B
1 V2
3 W42
1 S03
2 T02
3 U71

我想要一个只从B 列中提取int 的新列（在df 的末尾或用它替换B 列，因为这无关紧要）。那就是我希望列C 看起来像

所以如果数字前面有0，比如03，那么我想返回3而不是03

我该怎么做？

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

您可以使用正则表达式转换为字符串并提取整数。

df['B'].str.extract('(\d+)').astype(int)

【讨论】：

不错的一个！对于未来的读者：这也可以使用 compiled 正则表达式来完成更复杂的表达式。简单示例：exp = re.compile('\d+')。然后在str.extract(exp) 调用中使用exp。

【解决方案2】：

假设总是只有一个前导字母

df['B'] = df['B'].str[1:].astype(int)

【讨论】：

【解决方案3】：

先设置数据

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})

df.head()

然后进行提取并将其转换回整数

df['C'] = df['B'].str.extract('(\d+)').astype(int)

df.head()

【讨论】：

【解决方案4】：

我写了一个小循环来做到这一点，因为我的字符串没有在 DataFrame 中，而是在一个列表中。这样，您还可以添加一点 if 语句来解释浮点数：

output= ''
input = 'whatever.007'  

for letter in input :
        try :
            int(letter)
            output += letter

        except ValueError :
                pass

        if letter == '.' :
            output += letter

输出 = 浮点数（输出）

或者你可以 int(output) 如果你喜欢。

【讨论】：

【解决方案5】：

准备 DF 和你的一样：

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})

df.head()

现在操纵它以获得您想要的结果：

df['C'] = df['B'].apply(lambda x: re.search(r'\d+', x).group())

df.head()


    A   B   C
0   1   V2  2
1   3   W42 42
2   1   S03 03
3   2   T02 02
4   3   U71 71

【讨论】：

【解决方案6】：

如果您不想使用正则表达式，这是另一种方法：我使用map() 函数在列的每个元素上应用所需的内容。像这样：

letters = "abcdefghijklmnopqrstuvwxyz"
df['C'] = list(map(lambda x: int(x.lower().strip(letters))   ,  df['B']))

输出将是这样的：

【讨论】：

【解决方案7】：

我用过apply，它也很好用：

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})
df['C'] = df['B'].apply(lambda x: int(x[1:]))
df['C']

输出：

0     2
1    42
2     3
3     2
4    71
Name: C, dtype: int64

【讨论】：