【发布时间】:2011-04-05 10:53:00
【问题描述】:
我目前正在寻找动态过滤网站内容的解决方案。通过“动态”,我的意思是我会计算坏词的百分比,即shit、f**k 等在第一页的整个单词中的百分比。如果百分比不超过 30%,则表示允许该网站。我如何让它搜索第一页上的每个单词并将它们与坏单词列表匹配,然后除以单词的总数,这样我就可以得到百分比?基本原理不是制作内容过滤器,而是阻止网站,即使页面中的单个单词与坏词列表匹配。虽然我有这个,但它是静态的。
$filename = "filters.txt";
$fp = @fopen($filename, 'r');
if ($fp) {
$array = explode("\n", fread($fp, filesize($filename)));
foreach($array as $key => $val){
list($before,$after) = split("~",$val);
$input = preg_replace($before,$after,$input);
}
}
*filter.txt包含坏词列表
感谢埃里斯科!
试过了,但它似乎不起作用。
function get_content($url)
{
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_HEADER, 0);
ob_start();
curl_exec ($ch);
curl_close ($ch);
$string = ob_get_contents();
ob_end_clean();
return $string;
}
/* $toLoad is from Browse.php */
$sourceOfWebpage = get_content($toLoad);
$textOfWebpage = strip_tags($sourceOfWebpage);
/* array: Obtained by your filter.txt file */
// Open the filters file and filter all of the results.
$filename = "filters.txt";
$badWords = @fopen($filename, 'r');
if ($badWords) {
$array = explode("\n", fread($fp, filesize($filename)));
foreach($array as $key => $val){
list($before,$after) = split("~",$val);
$input = preg_replace($before,$after,$input);
}
}
/* float: Some decimal value */
$allowedBadWordsPercent = 0.30;
$numberOfWords = str_word_count($textOfWebpage);
$numberOfBadWords = 0;
str_ireplace($badWords, '', $sourceOfWebpage, $numberOfBadWords);
if ($numberOfBadWords != 0) {
$badWordsPercent = $numberOfWords / $numberOfBadWords;
} else {
$badWordsPercent = 0;
}
if ($badWordsPercent > $allowedBadWordsPercent) {
echo 'This is a naughty webpage';
}
【问题讨论】:
标签: php