【问题标题】:R Exact match strings in two columnsR 两列中的完全匹配字符串
【发布时间】:2019-04-13 11:50:02
【问题描述】:

我有一个如下形式的数据框:

Column1 = c('Elephant,Starship Enterprise,Cat','Random word','Word','Some more words, Even more words')
Column2=c('Rat,Starship Enterprise,Elephant','Ocean','No','more')
d1 = data.frame(Column1,Column2)

我要做的是查找并计算第 1 列和第 2 列中单词的完全匹配。每列可以有多个单词,用逗号分隔。

例如在第 1 行,我们看到有两个常用词 a) Starship Enterprise 和 b) Elephant。但是,在第 4 行中,即使在两列中都出现了单词 "more",但不会出现确切的字符串(Some more words 和 Even more words)。预期的输出将是这样的。

任何帮助将不胜感激。

【问题讨论】:

    标签: r string-matching data-manipulation


    【解决方案1】:

    按逗号拆分列并计算单词的交集

    mapply(function(x, y) length(intersect(x, y)), 
            strsplit(d1$Column1, ","), strsplit(d1$Column2, ","))
    #[1] 2 0 0 0
    

    或者tidyverse方式

    library(tidyverse)
    d1 %>%
      mutate(Common = map2_dbl(Column1, Column2, ~ 
          length(intersect(str_split(.x, ",")[[1]], str_split(.y, ",")[[1]]))))
    
    
    #                           Column1                          Column2 Common
    #1 Elephant,Starship Enterprise,Cat Rat,Starship Enterprise,Elephant      2
    #2                      Random word                            Ocean      0
    #3                             Word                               No      0
    #4 Some more words, Even more words                             more      0
    

    【讨论】:

      【解决方案2】:

      我们可以通过cSplit 做到这一点

      library(splitstackshape)
      library(data.table)
      v1 <- cSplit(setDT(d1, keep.rownames = TRUE), 2:3, ",", "long")[, 
          length(intersect(na.omit(Column1), na.omit(Column2))), rn]$V1
      d1[, Common := v1][, rn := NULL][]
      #                             Column1                          Column2 Common
      #1: Elephant,Starship Enterprise,Cat Rat,Starship Enterprise,Elephant      2
      #2:                      Random word                            Ocean      0
      #3:                             Word                               No      0
      #4: Some more words, Even more words                             more      0
      

      【讨论】:

        猜你喜欢
        • 2013-05-29
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-08-02
        • 2016-02-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多