【问题标题】:how to dynamically filter website content using PHP如何使用 PHP 动态过滤网站内容
【发布时间】:2011-04-05 10:53:00
【问题描述】:

我目前正在寻找动态过滤网站内容的解决方案。通过“动态”,我的意思是我会计算坏词的百分比,即shitf**k 等在第一页的整个单词中的百分比。如果百分比不超过 30%,则表示允许该网站。我如何让它搜索第一页上的每个单词并将它们与坏单词列表匹配,然后除以单词的总数,这样我就可以得到百分比?基本原理不是制作内容过滤器,而是阻止网站,即使页面中的单个单词与坏词列表匹配。虽然我有这个,但它是静态的。

$filename =   "filters.txt";

$fp = @fopen($filename, 'r');

if ($fp) {

$array = explode("\n", fread($fp, filesize($filename)));

foreach($array as $key => $val){

list($before,$after) = split("~",$val);

$input = preg_replace($before,$after,$input);

}
}

*filter.txt包含坏词列表


感谢埃里斯科!

试过了,但它似乎不起作用。

function get_content($url)
{
   $ch = curl_init();

   curl_setopt ($ch, CURLOPT_URL, $url);
   curl_setopt ($ch, CURLOPT_HEADER, 0);

   ob_start();

   curl_exec ($ch);
   curl_close ($ch);
   $string = ob_get_contents();

   ob_end_clean();

   return $string;    

}


/* $toLoad is from Browse.php */

$sourceOfWebpage = get_content($toLoad);
$textOfWebpage = strip_tags($sourceOfWebpage);

/* array: Obtained by your filter.txt file */
// Open the filters file and filter all of the results.

$filename =   "filters.txt";
$badWords = @fopen($filename, 'r');

if ($badWords) {
  $array = explode("\n", fread($fp, filesize($filename)));

  foreach($array as $key => $val){
    list($before,$after) = split("~",$val);
    $input = preg_replace($before,$after,$input);
  }
}

/* float: Some decimal value */

$allowedBadWordsPercent = 0.30;
$numberOfWords = str_word_count($textOfWebpage);
$numberOfBadWords = 0;
str_ireplace($badWords, '', $sourceOfWebpage, $numberOfBadWords);

if ($numberOfBadWords != 0) {
    $badWordsPercent = $numberOfWords / $numberOfBadWords;
} else {
    $badWordsPercent = 0;
}

if ($badWordsPercent > $allowedBadWordsPercent) {
    echo 'This is a naughty webpage';
}

【问题讨论】:

    标签: php


    【解决方案1】:

    这是我要做什么的粗略想法。您可能会争辩说,将 str_ireplace() 纯粹用于计数是不正当的。我不确定是否有更多方向功能而不破坏正则表达式。

    /* string: Obtained by CURL or similar */
    $sourceOfWebpage;
    
    $textOfWebpage = strip_tags($sourceOfWebpage);
    
    /* array: Obtained by your filter.txt file */
    $badWords;
    
    /* float: Some decimal value */
    $allowedBadWordsPercent = 0.30;
    
    $numberOfWords = str_word_count($textOfWebpage);
    $numberOfBadWords = 0;
    
    str_ireplace($badWords, '', $sourceOfWebpage, $numberOfBadWords);
    
    if ($numberOfBadWords != 0) {
        $badWordsPercent = $numberOfWords / $numberOfBadWords;
    } else {
        $badWordsPercent = 0;
    }
    
    if ($badWordsPercent > $allowedBadWordsPercent) {
        echo 'This is a naughty webpage';
    }
    

    【讨论】:

      猜你喜欢
      • 2014-10-21
      • 2021-01-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-03-30
      • 1970-01-01
      • 2021-04-13
      • 1970-01-01
      相关资源
      最近更新 更多