如何从句子中提取单个单词并将它们与 R 中的 pos 和 neg 字典中的单词匹配答案

【问题标题】：How to extract individual words from sentence and match them with words from pos and neg dictionaries in R [closed]如何从句子中提取单个单词并将它们与 R 中的 pos 和 neg 字典中的单词匹配
【发布时间】：2015-01-22 22:58:00
【问题描述】：

我需要用 R 语言创建一个函数，它将能够将句子切割成单词，然后这些单词与 pos 和 neg 字典中的单词匹配。这可能导致 Sentiment Score - 句子中的积极词等于 1，句子中的消极词等于 -1。

Product_ID        Sentence        Attribute        SentimentScore
1111111              1            graphics                1
1111111              1            windows                 1
1111111              2            loads                  -1
2222222              1            laptops                -1
2222222              2            design                  1

产品 1111111 的第一句话可能看起来像：...这个产品...很棒的图形...在我的 windows 上运行良好 >.

例如。带有肯定词的字典（pos.txt）如下所示：一个+ 盛产比比皆是丰富丰富可访问无障碍欢呼广受好评 ...等等

和带有否定词的字典（neg.txt）看起来像： 2面 2面异常废除可恶可恶鄙弃厌恶中止中止中止 ...等等

我在gitHub 看到了一个名为 score.sentiment 的函数，但它使用每个句子中的 pos 和 neg 词之间的差异来评估所有句子。我需要一些非常相似的东西，但对于单个单词。

我非常感谢您的任何帮助。非常感谢转发。

【问题讨论】：

你能提供句子吗？这似乎是一个标记化和匹配的任务。
第一个用户：物有所值的好打印机。无线设置非常简单。
第二：非常好的笔记本电脑！也对得起这个价格！令人惊叹且用户友好的第三个：这是一款相当不错的笔记本电脑/平板电脑。图片分辨率惊人！你得到的好价格。与 iPad 一样好，价格更优惠。

标签： r

【解决方案1】：

蛮力方法。不是最优的，因为使用了太多的 for 循环，但似乎正在做你需要的事情。希望这应该适合您的应用程序。您可以重新排列事物或将结果存储在另一个变量中，以便输出不带 [1] [1] 等。

代码：

sent = data.frame(Sentences=c("abundant bad abnormal activity was due to 2-face people","strange exciting activity was due to 2-face people"), user = c(1,2)) 
pos = c("abound" , "abounds", "abundant", "exciting")
neg = c("2-face","abnormal", "strange", "bad", "weird")

words <- matrix(ncol = 2,nrow=8)

words = (str_split(unlist(sent$Sentences)," "))

tmp <- data.frame()
tmn <- data.frame()

for (i in 1:nrow(sent)) {
  for (j in 1:length(words)) {
    for (k in 1:length(pos)){
      if (words[[i]][j] == pos[k]) {
        print(paste(i,words[[i]][j],1))
        tmn <- cbind(i,words[[i]][j],1)
        tmp <- rbind(tmp,tmn)
      }
    }
    for (m in 1:length(neg)){
      if (words[[i]][j] == neg[m]) { 
        print(paste(i,words[[i]][j],-1))
        tmn <- cbind(i,words[[i]][j],-1)
        tmp <- rbind(tmp,tmn)
      }
    }  
  }
}

View(tmp)

结果：

    i   V2         V3
1   1   abundant    1
2   1   bad        -1
3   2   strange    -1
4   2   exciting    1

【讨论】：

太好了。这正是我一直在寻找的。如何将结果存储到数据框或矩阵 nx3 中？
请见上文。这是一个在午休时快速“发明”的解决方案，目前还不是最佳的。但它有效。小心处理大型数据集，因为它使用 3 个嵌套的 for 循环。小的应该没问题
那太好了，非常感谢，你帮了我很多。我将尝试弄清楚它是如何工作的，然后为大数据集解决方案重写，因为我需要它来实现大数据。
出了点问题，因为如果我运行你的代码，它会跳过字典中的一些单词：sent1 = data.frame(Sentences=c("大量不良异常活动是由于 2-face people", "奇怪的激动人心的活动是由于伟大的 2-face people"), user = c(1,2)) pos1 = c("abound" , "abounds", "abundant", "exciting", "great") neg1 = c("2-face","abnormal", "strange", "bad", "weird") 它只产生：[1] "1 Famous 1" [1] "1 bad -1" [1] "2奇怪的 -1" [1] "2 令人兴奋的 1"
拜托，您能否为此写一个更好的解决方案。

【解决方案2】：

这会满足你的需要吗？

pos = c("abound" , "abounds", "abundant")
neg = c("2-face","abnormal")

sent = "abundant abnormal activity was due to 2-face people"

p = 0
for (i in 1:length(pos)) {
  if (grepl(pos[i],sent,ignore.case=T) == TRUE) p = p + 1  
}

n = 0
for (i in 1:length(neg)) {
  if (grepl(neg[i],sent,ignore.case=T) == TRUE) n = n + 1  
}

print(p)
print(n)
print(paste("Overall sentence sentiment score = ", p - n))

结果：阳性1，阴性2，总体-1

【讨论】：

我需要与上表相同的输出。
剪切句子中的单个单词并将它们与字典中的单词匹配，打印它们并用于 pos 词 1 和 neg 词 -1。这些值打印到另一列。
words = unlist(str_split(sent," ")) for (i in 1:length(words)) { for (j in 1:length(pos)){ if (words[i] == pos[j]) print(paste(words[i],1)) } for (k in 1:length(neg)){ if (words[i] == neg[k]) print(paste(words[i],-1)) } }
结果是：[1]“丰富1”[1]“异常-1”[1]“2-face -1”
如果我有，例如sent = data.frame(Sentences=c("大量异常活动是由于 2-face people","大量异常活动是由于 2-face people"), user = c(1,2)) 你能把这些删掉吗单词然后评估并与特定用户相关，请...我的意思是分组变量将是用户。

【解决方案3】：

sent1 = data.frame(Sentences=c("abundant bad abnormal activity was due to 2- face people","strange exciting activity was due to great 2-face people"), user = c(1,2)) 
pos1 = c("abound" , "abounds", "abundant", "exciting", "great")
neg1 = c("2-face","abnormal", "strange", "bad", "weird")

然后我用了：

words = (str_split(unlist(sent1$Sentences)," "))

tmp <- data.frame()
tmn <- data.frame()

for (i in 1:nrow(sent1)) {
   for (j in 1:length(words)) {
    for (k in 1:length(pos1)){
     if (words[[i]][j] == pos1[k]) {
    print(paste(i,words[[i]][j],1))
    tmn <- cbind(i,words[[i]][j],1)
    tmp <- rbind(tmp,tmn)
  }
}
for (m in 1:length(neg1)){
  if (words[[i]][j] == neg1[m]) { 
    print(paste(i,words[[i]][j],-1))
    tmn <- cbind(i,words[[i]][j],-1)
    tmp <- rbind(tmp,tmn)
      }
    }  
  }
 }

结果是：

print(tmp)
  i       V2 V3
1 1 abundant  1
2 1      bad -1
3 2  strange -1
4 2 exciting  1

如果我这样做：

sent1$Sentences <- as.character(sent1$Sentences)
List <- strsplit(sent1$Sentences, " ")
a <- data.frame(Id=rep(sent1$user, sapply(List, length)),    Words=unlist(List))
a$Words <- as.character(a$Words)
a[a$Words %in% pos1,]

结果为阳性：

Id    Words
1 abundant
2 exciting
2    great

和否定： a[a$Words %in% neg1,]

Id    Words
1      bad
1 abnormal
1   2-face
2  strange
2   2-face

但我需要为肯定词添加值 1，为否定词添加值 -1。

【讨论】：