【问题标题】:Compare two text and highlight only differences比较两个文本并仅突出显示差异
【发布时间】:2017-06-06 08:06:50
【问题描述】:

我有一个脚本可以比较两个文本并在不同的单词中突出显示,但它根本无法正常工作。许多词将它们标记为不同,例如,“that”“the”等词不考虑它们,如果它们在两个词之间,如果它们也发生了变化,则被标记为已更改。我附上一张图片。

<?php


$old = 'The one-page order, which Mr. Trump signed in a hastily arranged Oval Office ceremony shortly before departing for the inaugural balls, gave no specifics about which aspects of the law it was targeting. But its broad language gave federal agencies wide latitude to change, delay or waive provisions of the law that they deemed overly costly for insurers, drug makers, doctors, patients or states, suggesting that it could have wide-ranging impact, and essentially allowing the dismantling of the law to begin even before Congress moves to repeal it.';


$new = 'The one-page order, which Mr. Trump signed in a unexpectedly organized Oval workplace rite quickly before departing for the inaugural balls, gave no specifics approximately which components of the law it became targeting. But its large language gave federal organizations huge range to exchange, put off or waive provisions of the law that they deemed overly luxurious for insurers, drug makers, docs, sufferers or states, suggesting that it could have wide-ranging effect, and basically permitting the dismantling of the regulation to start even before Congress moves to repeal it.';



$oldArr = preg_split('/\s+/', $old);// old (initial) text splitted into words
$newArr = preg_split('/\s+/', $new);// new text splitted into words
$resArr = array();

$oldCount = count($oldArr)-1;
$newCount = count($newArr)-1;

$tmpOld = 0;// marker position for old (initial) string
$tmpNew = 0;// marker position for new (modified) string
$end = 0;// do while variable

// endless do while loop untill specified otherwise
while($end == 0){
// if marker position is less or equal than max count for initial text
// to make sure we don't overshoot the max lenght
if($tmpOld <= $oldCount){
// we check if current words from both string match, at the current marker positions
if($oldArr[$tmpOld] === $newArr[$tmpNew]){
// if they match, nothing has been modified, we push the word into results and increment both markers
array_push($resArr,$oldArr[$tmpOld]);
$tmpOld++;
$tmpNew++;
}else{
// fi the words don't match, we need to check for recurrence of the searched word in the entire new string
$foundKey = array_search($oldArr[$tmpOld],$newArr,TRUE);
// if we find it
if($foundKey != '' && $foundKey > $tmpNew){
// we get all the words from the new string between the current marker and the foundKey exclusive
// and we place them into results, marking them as new words
for($p=$tmpNew;$p<$foundKey;$p++){
array_push($resArr,'<span class="new-word">'.$newArr[$p].'</span>');
}
// after that, we insert the found word as unmodified
array_push($resArr,$oldArr[$tmpOld]);
// and we increment old marker position by 1
$tmpOld++;
// and set the new marker position at the found key position, plus one
$tmpNew = $foundKey+1;
}else{
// if the word wasn't found it means it has been deleted
// and we need to add ti to results, marked as deleted
array_push($resArr,'<span class="old-word">'.$oldArr[$tmpOld].'</span>');
// and increment the old marker by one
$tmpOld++;
}
}
}else{
$end = 1;
}
}

$textFinal = '';
foreach($resArr as $val){
$textFinal .= $val.' ';
}
echo "<p>".$textFinal."</p>";
?>
<style>
body {
background-color: #2A2A2A;
}

@font-face {
font-family: 'Eras Light ITC';
font-style: normal;
font-weight: normal;
src: local('Eras Light ITC'), url('ERASLGHT.woff') format('woff');
}

p {
font-family: 'Eras Light ITC', Arial;
color:white;
}

.new-word{background:rgba(1, 255, 133, 0.9);color:black;font-weight: bold;}
.new-word:after{background:rgba(1, 255, 133, 0.9)}
.old-word{text-decoration:none; position:relative;background:rgba(215, 40, 40, 0.9);}
.old-word:after{


}
</style>

例子:

Example image result

如果这些不同的词没有改变,你为什么要标记它们? 问候!

【问题讨论】:

    标签: text compare highlight difference words


    【解决方案1】:

    我检查了你的代码,尝试了不同的情况,我认为你的算法是错误的。

    例如,如果您键入“one-page”而不是“for”或“the”,您会看到它看起来像是“unmatch”。这背后的原因是,当不匹配时,您正在所有数组中搜索不匹配的单词。那么如果给定的单词已经被跳过(索引号较少),你的算法就会失败。

    要查看这一点,您可以使用以下变量。

    $old = 'for costly for insurers.';
    $new = 'for luxurious for insurers.';
    

    对于此设置,当发现代价高昂的不匹配时,您的代码会尝试匹配后面的“for”字词。但是您正在使用的 array_search 调用返回字符串开头的“for”位置。

    $foundKey = array_search($oldArr[$tmpOld],$newArr,TRUE);
    

    因此,您应该尝试修改此部分以以不同的方式进行搜索。您可以编写具有“starting_indices”功能的array_search。 (或者,也许您可​​以从数组中取消设置成功匹配的元素。)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-12-11
      • 2012-09-12
      • 2015-08-10
      • 1970-01-01
      相关资源
      最近更新 更多