【问题标题】:Extract numbers from string column from Pandas DF从 Pandas DF 的字符串列中提取数字
【发布时间】:2021-05-26 07:39:44
【问题描述】:

我有下一个带有字符串列(“信息”)的 DataFrame:

df = pd.DataFrame( {'Date': ["2014/02/02", "2014/02/03"], 'Info': ["Out of 78 shares traded during the session today, there were 54 increases, 9 without change and 15 decreases.", "Out of 76 shares traded during the session today, there were 60 increases, 4 without change and 12 decreases."]})

我需要将“Info”中的数字提取到同一 df 中的新 4 列中。

第一行的值为 [78, 54, 9, 15]

我已经尝试过

df[["new1","new2","new3","new4"]]= df.Info.str.extract('(\d+(?:\.\d+)?)', expand=True).astype(int)

但我认为这更复杂。

问候,

【问题讨论】:

    标签: python pandas string extract


    【解决方案1】:

    Extractall 可能更适合这项任务

    df[["new1","new2","new3","new4"]] = df['Info'].str.extractall(r'(\d+)')[0].unstack()
    
             Date                                               Info new1 new2 new3 new4
    0  2014/02/02  Out of 78 shares traded during the session tod...   78   54    9   15
    1  2014/02/03  Out of 76 shares traded during the session tod...   76   60    4   12
    

    【讨论】:

      【解决方案2】:

      据我所知,您是在尝试避免捕获数字的小数部分,对吧? ((?:\.\d+)? 部分。)

      首先,如果你想要所有的匹配,你需要使用pd.Series.str.extractallextract 在第一个之后停止。

      使用你的df,试试这个代码:

      # Get a multiindexed dataframe using extractall
      expanded = df.Info.str.extractall(r"(\d+(?:\.\d+)?)")
      
      # Pivot the index labels
      df_2 = expanded.unstack()
      
      # Drop the multiindex
      df_2.columns = df_2.columns.droplevel()
      
      
      # Add the columns to the original dataframe (inplace or make a new df)
      df_combined = pd.concat([df, df_2], axis=1)
      

      【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-12-15
      • 2016-10-07
      • 2019-04-22
      • 2021-12-05
      • 1970-01-01
      • 2014-04-16
      • 1970-01-01
      相关资源
      最近更新 更多