【发布时间】:2018-10-13 12:25:44
【问题描述】:
我有一个 2 列的 DF,我有一个单词列表。
list_of_words <- c("tiger","elephant","rabbit", "hen", "dog", "Lion", "camel", "horse")
df <- tibble::tibble(page=c(12,6,9,18,2,15,81,65),
text=c("I have two pets: a dog and a hen",
"lion and Tiger are dangerous animals",
"I have tried to ride a horse",
"Why elephants are so big in size",
"dogs are very loyal pets",
"I saw a tiger in the zoo",
"the lion was eating a buffalo",
"parrot and crow are very clever birds"))
animals <- c("dog,hen", "lion,tiger", "horse", FALSE, "dog", "tiger", "lion", FALSE)
cbind(df, animals)
#> page text animals
#> 1 12 I have two pets: a dog and a hen dog,hen
#> 2 6 lion and Tiger are dangerous animals lion,tiger
#> 3 9 I have tried to ride a horse horse
#> 4 18 Why elephants are so big in size FALSE
#> 5 2 dogs are very loyal pets dog
#> 6 15 I saw a tiger in the zoo tiger
#> 7 81 the lion was eating a buffalo lion
#> 8 65 parrot and crow are very clever birds FALSE
我需要找出列表中的任何单词是否存在于 DF 的一列中。如果是,则将单词/单词返回到 DF 中的新列。这是单词列表 ->(tiger,elephant,rabbit, hen, dog, Lion, camel, horse)。 This is how my DF Looks like I want something like this
【问题讨论】:
-
请将您的示例数据添加为代码,而不是图像。
-
是的,部分正确。但我想从列表中找出哪些匹配的单词出现在 DF 中,并将这些单词返回到同一 DF 的新列中。
-
这 4 个步骤将起作用:首先在您的列上使用
strsplitdf$text和" "作为拆分参数,就像这样test <- strsplit(df$text, " ")。然后使用grepl和tolower得到与你的向量匹配的词:test2 <- lapply(test, function(x) x[grepl(tolower(paste(words, collapse = "|")), tolower(x))])。现在将它们放在每一行中,并使用df$animals <- unlist(lapply(test2, paste, collapse = ", "))取消列出它们,然后使用df$animals[nchar(df$animals) == 0] <- FALSE将所有空字符设置为FALSE。 -
@LAP 不起作用
标签: r text-mining