很好的讨论。我喜欢:D
@andy:太好了,对我来说,第一次有人可以真正准确地解释差异,我明白了:D
@Marcos Dimitrio 我同意
我自己运行了一个非常讨厌的 Twitter API 守护程序线程池,如果我理解正确的话,除了来自 facebook 的 $_POST 推送之外,它们完全可以做到这一点。它通过流 api Firehose 实时监控数百/数千个关键字集群数组的推文。这是要走的路,否则会遭受可怕的失败:当然,恕我直言。这是 getTweets 和 parseTweets 两个守护进程的一半。
<?php
ob_start();
require_once('config/phirehose-config.php');
require_once('lib.php');
$oDB = new db;
// run as a daemon aka background process
while (true) {
// Process all statuses
$query = 'SELECT cache_id, raw_tweet ' .
'FROM json_cache';
$result = $oDB->select($query);
while($row = mysqli_fetch_assoc($result)) {
$cache_id = $row['cache_id'];
// $status = unserialize(base64_decode($row['raw_tweet']));
$tweet_object = json_decode($row['raw_tweet'],false);
// JSON payload for statuses stored in the database
// serialized base64 raw data
// Delete cached copy of tweet
// $oDB->select("DELETE FROM json_cache WHERE cache_id = $cache_id");
// Limit tweets to a single language,
// such as 'en' for English
//if ($tweet_object->lang <> 'nl') {continue;}
// Test status update before inserting
$tweet_id = $tweet_object->id_str;
if ($oDB->in_table('tweets','tweet_id=' . $tweet_id )) {continue;}
$tweet_text = $oDB->escape($tweet_object->text);
$created_at = $oDB->date($tweet_object->created_at);
if (isset($tweet_object->geo)) {
$geo_lat = $tweet_object->geo->coordinates[0];
$geo_long = $tweet_object->geo->coordinates[1];
} else {
$geo_lat = $geo_long = 0;
}
$user_object = $tweet_object->user;
$user_id = $user_object->id_str;
$screen_name = $oDB->escape($user_object->screen_name);
$name = $oDB->escape($user_object->name);
$profile_image_url = $user_object->profile_image_url;
// Add a new user row or update an existing one
$field_values = 'screen_name = "' . $screen_name . '", ' .
'profile_image_url = "' . $profile_image_url . '", ' .
'user_id = ' . $user_id . ', ' .
'name = "' . $name . '", ' .
'location = "' . $oDB->escape($user_object->location) . '", ' .
'url = "' . $user_object->url . '", ' .
'description = "' . $oDB->escape($user_object->description) . '", ' .
'created_at = "' . $oDB->date($user_object->created_at) . '", ' .
'followers_count = ' . $user_object->followers_count . ', ' .
'friends_count = ' . $user_object->friends_count . ', ' .
'statuses_count = ' . $user_object->statuses_count . ', ' .
'time_zone = "' . $user_object->time_zone . '", ' .
'last_update = "' . $oDB->date($tweet_object->created_at) . '"' ;
if ($oDB->in_table('users','user_id="' . $user_id . '"')) {
$oDB->update('users',$field_values,'user_id = "' .$user_id . '"');
} else {
$oDB->insert('users',$field_values);
}
// percist status to database
$field_values = 'tweet_id = ' . $tweet_id . ', ' ....
//... Somethings are to be for da cook alone, its hard work
foreach ($entities->hashtags as $hashtag) {
$where = 'tweet_id=' . $tweet_id . ' ' .
'AND tag="' . $hashtag->text . '"';
if(! $oDB->in_table('tweet_tags',$where)) {
$field_values = 'tweet_id=' . $tweet_id . ', ' .
'tag="' . $hashtag->text . '"';
$oDB->insert('tweet_tags',$field_values);
}
}
foreach ($entities->urls as $url) {
if (empty($url->expanded_url)) {
$url = $url->url;
} else {
$url = $url->expanded_url;
}
$where = 'tweet_id=' . $tweet_id . ' ' .
'AND url="' . $url . '"';
if(! $oDB->in_table('tweet_urls',$where)) {
$field_values = 'tweet_id=' . $tweet_id . ', ' .
'url="' . $url . '"';
$oDB->insert('tweet_urls',$field_values);
}
}
}
if(DEBUG){
echo ob_get_contents();
ob_clean();
}else{
ob_clean();
}
// Longer sleep equals lower server load
sleep(1);
}
?>
对于有我自己的工作人员的蜘蛛和爬虫也很有效。告诉我一个更好的方法来做到这一点,所有的东西都被认为是资源和可扩展性,作为一个用于 FB 状态更新的持久连接的网站小部件,真的就像再次使用 Echelon 作为电视遥控器恕我直言)。