【问题标题】:String split on digit and space字符串在数字和空格上拆分
【发布时间】:2021-03-04 08:46:11
【问题描述】:

除非第二个单词是小写,否则如何用第一个空格分割长字符串?

df                             col
0     Apple The fruit. 20 Banana tree A fruit. 30  Carrot A Vegetable. 40

预期输出:

df
  fruit          definition      page
0 Apple          The fruit.       20
1 Banana tree    A fruit.         30
2 Carrot         A Vegetable.     40

df.col.str.split('(\d+)').explode()

0 Apple The fruit.
0  20
0 Banana tree A fruit.
0  30
0 Carrot A Vegetable.
0  40
df.col.split(".", expand = True)

【问题讨论】:

    标签: python python-3.x regex pandas


    【解决方案1】:

    你可以这样做:

    new_df = pd.DataFrame()
    
    new_df[["fruit", "definition"]] = df.col.str.split("\d+")\
        .str[:-1].explode()\
        .str.strip()\
        .str.extract(r'^([A-Z][^A-Z]*)(.*)')
    
    new_df["page"] = df.col.str.findall('\d+').explode()
    new_df = new_df.reset_index(drop = True)
    
    new_df
              fruit    definition page
    0        Apple     The fruit.   20
    1  Banana tree       A fruit.   30
    2       Carrot   A Vegetable.   40
    

    文档

    1. pandas.Series.str.split
    2. pandas.Series.explode
    3. pandas.Series.str.strip
    4. pandas.Series.str.extract
    5. pandas.Series.str.findall
    6. pandas.DataFrame.reset_index

    【讨论】:

      猜你喜欢
      • 2015-11-26
      • 2014-02-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-06-18
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多