【问题标题】:How to search part of string that contain in a list of string, and return the matched one in R如何搜索包含在字符串列表中的部分字符串,并在 R 中返回匹配的字符串
【发布时间】:2016-01-15 03:47:06
【问题描述】:

下面的数据框包含一个“Campaign”列,列的值包含有关季节、名称和位置的信息,但是,这些信息的顺序在每一行中都是不同的。幸运的是,这些信息是一个固定列表,因此我们可以创建一个向量来匹配“Campaign_name”列中的字符串。

   Date           Campaign
1 Jan-15   Summer|Peter|Up
2 Feb-15 David|Winter|Down
3 Mar-15   Up|Peter|Spring

这就是我想要做的,我想创建 3 列作为名称、季节、位置。所以这些列可以搜索campaign列内的字符串,并从下面的列表中返回匹配的值。

Name <- c("Peter, David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

所以我想要的结果如下

Temp
    Date          Campaign  Name Season Position
1 15-Jan   Summer|Peter|Up Peter Summer       Up
2 15-Feb David|Winter|Down David Winter     Down
3 15-Mar   Up|Peter|Spring Peter Spring       Up

【问题讨论】:

    标签: r


    【解决方案1】:

    另一种方式:

    L <- strsplit(df$Campaign,split = '\\|')
    
    df$Name <- sapply(L,intersect,Name)
    df$Season <- sapply(L,intersect,Season)
    df$Position <- sapply(L,intersect,Position)
    

    【讨论】:

      【解决方案2】:

      执行以下操作:

      Date = c("Jan-15","Feb-15","Mar-15")
      Campaign = c("Summer|Peter|Up","David|Winter|Down","Up|Peter|Spring")
      df = data.frame(Date,Campaign)
      
      Name <- c("Peter", "David")
      Season <- c("Summer","Spring","Autumn", "Winter")
      Position <- c("Up","Down")
      
      for(k in Name){
          df$Name[grepl(pattern = k, x = df$Campaign)] <- k
      }
      
      for(k in Season){
          df$Season[grepl(pattern = k, x = df$Campaign)] <- k
      }
      
      for(k in Position){
          df$Position[grepl(pattern = k, x = df$Campaign)] <- k
      }
      

      这给出了:

      > df
          Date          Campaign  Name Season Position
      1 Jan-15   Summer|Peter|Up Peter Summer       Up
      2 Feb-15 David|Winter|Down David Winter     Down
      3 Mar-15   Up|Peter|Spring Peter Spring       Up
      

      【讨论】:

        【解决方案3】:

        我和 Marat Talipov 有同样的想法;这是一个 data.table 选项:

        library(data.table)
        
        Name     <- c("Peter", "David")
        Season   <- c("Summer","Spring","Autumn", "Winter")
        Position <- c("Up","Down")
        
        dat <- data.table(Date=c("Jan-15", "Feb-15", "Mar-15"),
                          Campaign=c("Summer|Peter|Up", "David|Winter|Down", "Up|Peter|Spring"))
        

        给予

        > dat
         Date          Campaign
        1: Jan-15   Summer|Peter|Up
        2: Feb-15 David|Winter|Down
        3: Mar-15   Up|Peter|Spring
        

        然后处理

        dat[ , `:=`(Name     = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
                    Season   = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
                    Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
            ]
        

        结果:

        > dat
             Date          Campaign  Name Season Position
        1: Jan-15   Summer|Peter|Up Peter Summer       Up
        2: Feb-15 David|Winter|Down David Winter     Down
        3: Mar-15   Up|Peter|Spring Peter Spring       Up
        

        如果您对很多列执行此操作或需要就地修改(通过引用),也许会有一些好处。

        如果有人能告诉我如何一次更新所有三列,我很感兴趣。

        编辑:没关系,想通了;

        for (icol in c("Name", "Season", "Position")) 
            dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]
        

        【讨论】:

        • 看看 data.table 包中的?tstrsplit
        猜你喜欢
        • 2018-10-15
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-09-17
        • 2021-07-15
        • 1970-01-01
        • 2020-06-23
        相关资源
        最近更新 更多