【发布时间】:2014-11-17 04:48:19
【问题描述】:
我基本上有一个简单的程序,它从表单中获取一些文本作为输入,将文本中的所有单词匹配到两个词典。一个词典包含一个肯定词列表,另一个包含一个否定词列表。对于每个正词匹配,$posMatchCount 会递增。对于每个否定词匹配,$negMatchCount 递增。做一个简单的比较,如果正数更大,程序返回“正”,否则返回“负”。如果正面词 == 负面词,或者没有正面或负面匹配,则返回“中性”。完整代码如下:
<?php
include("positive_lexicon.php");
include("negative_lexicon.php");
?>
<html>
<head>
<title>Output</title>
</head>
<body>
<h1>Output</h1>
<hr>
<?php
$preprocessedDoc2 = "我喜欢这款手机,但讨厌电池,我喜欢屏幕尺寸";
/////////////////////////////////////////////////////////////////////////////////match doc text with POSITIVE sentiment lexicon
$matchedPosWords = NULL;//contains matched words
$posMatchCount = 0;//count of POS matches
$array1 = explode(' ', $preprocessedDoc2);
foreach($array1 as $word){
if(preg_match("/\s{$word}\s/", $positiveLexicon)){
$matchedPosWords = $matchedPosWords . $word . " - ";
$posMatchCount++;
$posMatch = true; //for subjectivity check
}
else{
$posMatch= false; //for subjectivity check
}
}
echo "Matched POSITIVE words: <br><br>";
echo "<div style=\"background-color:#66FF66\">";
echo $matchedPosWords . " (Total: {$posMatchCount})";
echo "</div>";
echo "<br><br>";
/////////////////////////////////////////////////////////////////////////////////match doc text with NEGATIVE sentiment lexicon
$matchedNegWords = NULL;//contains matched words
$negMatchCount = 0;//count of NEG matches
$array2 = explode(' ', $preprocessedDoc2);
foreach($array2 as $word2){
if(preg_match("/\s{$word2}\s/", $negativeLexicon)){
$matchedNegWords = $matchedNegWords . $word2 . " - ";
$negMatchCount++;
$negMatch = true; //for subjectivity check
}
else{
$negMatch = false; //for subjectivity check
}
}
echo "Matched NEGATIVE words: <br><br>";
echo "<div style=\"background-color:#FF5050\">";
echo $matchedNegWords . " (Total: {$negMatchCount})";
echo "</div>";
echo "<br><br>";
/////////////////////////////////////////////////////////////////////////////////comparison between POSITIVE and NEGATIVE words
echo "analyzing document's sentiment ...<br><br>";
function checkPolarity($posWords, $negWords, $posMatch1, $negMatch1){//function to check polarity of doc
if((($posMatch1==false) && ($negMatch1==false))||($posWords==$negWords)){
return "<strong>NEUTRAL</strong>"; //if there are no POS or NEG matches, or matches are equal, return NEUTRAL
}
if($posWords > $negWords){
return "<strong>POSITIVE</strong>"; //if count of POS matches is greater than count of NEG matches, return POSITIVE
}
else{
return "<strong>NEGATIVE</strong>"; //if count of NEG matches is greater than count of POS matches, return NEGATIVE
}
}
$polarity = checkPolarity($posMatchCount, $negMatchCount, $posMatch, $negMatch); //call function to check polarity
echo "Polarity of the document is: " . $polarity; //display overall polarity
echo "<br><br>";
$polarity = "";
?>
</body>
</html>
但是,有时即使正面词的数量大于负面词的数量,它也会返回“神经”。有时它会额外增加。例如,字符串输入“我喜欢这款手机,但讨厌我喜欢屏幕尺寸的电池”返回以下内容:
Matched POSITIVE words:
love - adore - - (Total: 3)
Matched NEGATIVE words:
hate - - (Total: 2)
尽管只有两个正匹配和一个负匹配,但它给出的正匹配计数为 3,负匹配计数为 2。我知道问题会立即在 SO 上被发现,即使我似乎找不到它。我会试试我的运气..
【问题讨论】: