【问题标题】:Splitting parts of a long name into different columns of data frame in R在R中将长名称的一部分拆分为不同的数据框列
【发布时间】:2021-10-26 10:16:23
【问题描述】:

我有一个数据框:

ColA                             ColB
210812_150um_23deg_5sec          3.8 kPa
210812_150um_23deg_30sec         2.9 kPa
210812_150um_L1_7deg_120sec      5.3 kPa
210812_2.5mg_150um_7deg_120sec   6.8 kPa
...

从 colA 中的名称中提取元素(当没有常规模式时)并将它们分成更多列的最简单方法是什么?这样我会有一个包含以下列的 df 吗?

Date     Angle   Time    Pressure
210812   23deg   5sec    3.8 kPa
210812   23deg   30sec   2.9 kPa
...

【问题讨论】:

    标签: r sorting multiple-columns


    【解决方案1】:

    你可以在这里使用strsplit

    df_split <-strsplit(df$ColA, "_")[[1]] 
    df$Date <- df_split[1]
    df$Angle <- df_split[3]
    df$Time <- df_split[4]
    df$Pressure <- df$ColB
    df
    
                                ColA    ColB   Date Angle Time Pressure
    1        210812_150um_23deg_5sec 3.8 kPa 210812 23deg 5sec  3.8 kPa
    2       210812_150um_23deg_30sec 2.9 kPa 210812 23deg 5sec  2.9 kPa
    3    210812_150um_L1_7deg_120sec 5.3 kPa 210812 23deg 5sec  5.3 kPa
    4 210812_2.5mg_150um_7deg_120sec 6.8 kPa 210812 23deg 5sec  6.8 kPa
    

    数据:

    df <- data.frame(ColA=c("210812_150um_23deg_5sec",
                            "210812_150um_23deg_30sec",
                            "210812_150um_L1_7deg_120sec",
                            "210812_2.5mg_150um_7deg_120sec"),
                     ColB=c("3.8 kPa", "2.9 kPa", "5.3 kPa", "6.8 kPa"),
                     stringsAsFactors=FALSE)
    

    【讨论】:

      【解决方案2】:

      您可以使用tidyr::extract并传入正则表达式来提取感兴趣的数据。

      tidyr::extract(df, ColA, c('Date', 'Angle', 'Time'), '(\\d+).*(\\d+deg)_(\\d+sec)')
      
      #    Date Angle   Time   ColB
      #1 210812  3deg   5sec 3.8kPa
      #2 210812  3deg  30sec 2.9kPa
      #3 210812  7deg 120sec 5.3kPa
      #4 210812  7deg 120sec 6.8kPa
      

      相同的正则表达式可用于基本 R 的 strcapture

      cbind(strcapture('(\\d+).*(\\d+deg)_(\\d+sec)', df$ColA, 
                 proto = list(Date = character(), Angle = character(), 
                             Time = character())), df[2])
      

      数据

      如果您在reproducible format 中提供数据会更容易提供帮助-

      df <- structure(list(ColA = c("210812_150um_23deg_5sec", "210812_150um_23deg_30sec", 
      "210812_150um_L1_7deg_120sec", "210812_2.5mg_150um_7deg_120sec"
      ), ColB = c("3.8kPa", "2.9kPa", "5.3kPa", "6.8kPa")), 
      class = "data.frame", row.names = c(NA, -4L))
      

      【讨论】:

        【解决方案3】:

        我们可以将base Rread.csvsub 一起使用

        cbind(read.csv(text = sub("^(\\d+)_.*_(\\d+deg)_(\\d+sec)", "\\1,\\2,\\3", 
            df$ColA), header = FALSE, col.names = c("Date", "Angle", "Time")), df['ColB'])
            Date Angle   Time   ColB
        1 210812 23deg   5sec 3.8kPa
        2 210812 23deg  30sec 2.9kPa
        3 210812  7deg 120sec 5.3kPa
        4 210812  7deg 120sec 6.8kPa
        

        数据

        df <- structure(list(ColA = c("210812_150um_23deg_5sec", "210812_150um_23deg_30sec", 
        "210812_150um_L1_7deg_120sec", "210812_2.5mg_150um_7deg_120sec"
        ), ColB = c("3.8kPa", "2.9kPa", "5.3kPa", "6.8kPa")), 
        class = "data.frame", row.names = c(NA, 
        -4L))
        

        【讨论】:

          猜你喜欢
          • 2022-12-09
          • 1970-01-01
          • 2020-08-13
          • 1970-01-01
          • 2014-11-18
          • 2017-03-29
          • 2022-01-02
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多