pandas 基于模式拆分字母和数字混合列答案

【问题标题】：pandas splitting letter and number mix column based on patternpandas 基于模式拆分字母和数字混合列
【发布时间】：2018-05-27 21:45:15
【问题描述】：

我有一个包含一列的示例数据库：

import pandas as pd
d = {
 
 'question#': ['a1.2','a10','a10.1','b11.1a','k20.3d','b20c']
}
df = pd.DataFrame(d)

看起来像这样：

Out[8]: 
question#
0       a1.2
1       a10
2       a10.1
3       b11.1a
4       k20.3d
5       b20c

没有任何方法可以正确地对数字和字母混合列进行排序，所以我认为唯一的方法是首先将列拆分为 3 列：

第一列：一个字母：(a-z)，字符串总是以一个字母开头

第二栏：两种可能的结果：

一位或多位：(1-9)+

或
数字 + '.' + 数字：(1-9)+(/.)(1-9)+

第三列：一个字母还是什么都没有：(a-z)？

所以对于示例数据库，我希望将其拆分为以下列，DESIRED OUTPUT：

Out[8]: 
question#  firstcol   secondcol    thirdcol
0             a         1.2
1             a         10
2             a         10.1
3             b         11.1           a
4             k         20.3           d
5             b         20             c

语法类似于这个页面吗？我不确定如何准确地编写正则表达式语法：

https://chrisalbon.com/python/pandas_regex_to_create_columns.html

  df['firstcol'] = df['question#'].str.extract(not sure the syntax, expand=True)
  df['secondcol'] = df['question#'].str.extract(not sure the syntax, expand=True)
  df['thirdcol'] = df['question#'].str.extract(not sure the syntax, expand=True)

【问题讨论】：

你想要的排序输出是什么？
它在那里，但我在部件中添加了“期望的输出”字样，希望现在更明显
以df['firstcol'] = df['question#'].str[0]开头

标签： python regex pandas split

【解决方案1】：

试试

df[['firstcol', 'secondcol', 'thirdcol']] = df['question#'].str.extract('([A-Za-z]+)(\d+\.?\d*)([A-Za-z]*)', expand = True)


    question#   firstcol    secondcol   thirdcol
0   a1.2        a           1.2 
1   a10         a           10  
2   a10.1       a           10.1    
3   b11.1a      b           11.1        a
4   k20.3d      k           20.3        d
5   b20c        b           20          c

【讨论】：