【问题标题】:extracting data from the tweets of the twitter using python使用python从推特的推文中提取数据
【发布时间】:2015-07-07 21:23:33
【问题描述】:

我想提取推文 id 、推特用户名、在推文中显示 fb.me 链接的用户的推特 id 以及他的 fb id 和 fb 用户名等数据。

我必须为 200 条此类推文执行此操作。

我的代码:

from twitter.oauth import OAuth
import json
import urllib2
from twitter import *

ckey = ''
csecret = '' 
atoken = '' 
asecret = ''



auth = OAuth(atoken,asecret,ckey,csecret)

t_api = Twitter(auth=auth)

search = t_api.search.tweets(q='http://on.fb.me',count=1)

print search

print 'specific data'

#print search['statuses'][0]['entities']['urls']

现在检索 1 个结果,并希望提取上述数据。

我得到的结果:

{u'search_metadata': {u'count': 1, u'completed_in': 0.021, u'max_id_str': u'542227367834685440', u'since_id_str': u'0', u'next_results': u'?max_id=542227367834685439&q=http%3A%2F%2Fon.fb.me&count=1&include_entities=1', u'refresh_url': u'?since_id=542227367834685440&q=http%3A%2F%2Fon.fb.me&include_entities=1', u'since_id': 0, u'query': u'http%3A%2F%2Fon.fb.me', u'max_id': 542227367834685440L}, u'statuses': [{u'contributors': None, u'truncated': False, u'text': u'Check out Monday Morning Cooking Club Cooking Tip Day #1 --&gt;http://t.co/j6mbg1OE6Z | http://t.co/c7qjunLQz2', u'in_reply_to_status_id': None, u'id': 542227367834685440L, u'favorite_count': 0, u'source': u'<a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'symbols': [], u'user_mentions': [], u'hashtags': [], u'urls': [{u'url': u'http://t.co/j6mbg1OE6Z', u'indices': [63, 85], u'expanded_url': u'http://on.fb.me/', u'display_url': u'on.fb.me'}, {u'url': u'http://t.co/c7qjunLQz2', u'indices': [88, 110], u'expanded_url': u'http://bit.ly/12BbG16', u'display_url': u'bit.ly/12BbG16'}]}, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'retweet_count': 0, u'id_str': u'542227367834685440', u'favorited': False, u'user': {u'follow_request_sent': False, u'profile_use_background_image': True, u'profile_text_color': u'333333', u'default_profile_image': False, u'id': 226140415, u'profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/704964581/bc37b358019be05efe1094a0d100ea53.jpeg', u'verified': False, u'profile_location': None, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/469488950050955264/FOoWjIEZ_normal.jpeg', u'profile_sidebar_fill_color': u'DDEEF6', u'entities': {u'url': {u'urls': [{u'url': u'http://t.co/sida0E6eXy', u'indices': [0, 22], u'expanded_url': u'http://www.mondaymorningcookingclub.com.au', u'display_url': u'mondaymorningcookingclub.com.au'}]}, u'description': {u'urls': []}}, u'followers_count': 1574, u'profile_sidebar_border_color': u'000000', u'id_str': u'226140415', u'profile_background_color': u'EDCDC7', u'listed_count': 50, u'is_translation_enabled': False, u'utc_offset': 39600, u'statuses_count': 12594, u'description': u"Monday Morning Cooking Club. A bunch of Sydney gals sharing and preserving the wonderful recipes of Australia's culturally diverse Jewish community.", u'friends_count': 1904, u'location': u'Sydney,  Australia', u'profile_link_color': u'C40A38', u'profile_image_url': u'http://pbs.twimg.com/profile_images/469488950050955264/FOoWjIEZ_normal.jpeg', u'following': False, u'geo_enabled': True, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/226140415/1400769931', u'profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/704964581/bc37b358019be05efe1094a0d100ea53.jpeg', u'name': u'Lisa Goldberg', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 1309, u'screen_name': u'MondayMorningCC', u'notifications': False, u'url': u'http://t.co/sida0E6eXy', u'created_at': u'Mon Dec 13 12:22:13 +0000 2010', u'contributors_enabled': False, u'time_zone': u'Sydney', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Tue Dec 09 08:00:53 +0000 2014', u'in_reply_to_status_id_str': None, u'place': None, u'metadata': {u'iso_language_code': u'en', u'result_type': u'recent'}}]}

你能帮我弄清楚如何检索这些特定数据吗?

【问题讨论】:

    标签: python twitter extraction


    【解决方案1】:

    你可以做这样的事情来发出一个查询,然后通过相应的键查询得到你想要的数据。

    import json
    import urllib2
    import twitter
    
    ckey = 'Your consumer key'
    csecret = 'your consumer secret' 
    atoken = 'your token' 
    asecret = 'your secret token'
    
    auth = twitter.oauth.OAuth(atoken, asecret,
                               ckey, csecret)
    
    twitter_api = twitter.Twitter(auth=auth)
    
    q = 'http://on.fb.me'
    
    count = 100
    
    search_results = twitter_api.search.tweets(q=q, count=count)
    
    statuses = search_results['statuses']
    
    # Iterate through 5 more batches of results by following the cursor
    
    for _ in range(5):
        print "Length of statuses", len(statuses)
        try:
            next_results = search_results['search_metadata']['next_results']
        except KeyError, e: # No more results when next_results doesn't exist
            break
    
        # Create a dictionary from next_results, which has the following form:
        # ?max_id=313519052523986943&q=NCAA&include_entities=1
        kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ])
    
        search_results = twitter_api.search.tweets(**kwargs)
        statuses += search_results['statuses']
    
    # Show one sample search result by slicing the list...
    print json.dumps(statuses[0], indent=1)
    
    # get relevant data into lists
    user_names = [ user_mention['name'] 
                     for status in statuses
                         for user_mention in status['entities']['user_mentions'] ]
    
    screen_names = [ user_mention['screen_name'] 
                     for status in statuses
                         for user_mention in status['entities']['user_mentions'] ]
    
    id_str = [ user_mention['id_str'] 
                     for status in statuses
                         for user_mention in status['entities']['user_mentions'] ]
    
    t_id = [ status['id'] 
             for status in statuses ]
    
    # print out first 5 results
    print json.dumps(screen_names[0:5], indent=1) 
    print json.dumps(user_names[0:5], indent=1)
    print json.dumps(id_str[0:5], indent=1)
    print json.dumps(t_id[0:5], indent=1)
    

    结果:

    [
     "DijalogNet", 
     "Kihot_ex_of", 
     "Kihot_ex_of", 
     "JAsunshine1011", 
     "RobertCornegyJr"
    ]
    [
     "Dijalog Net", 
     "Sa\u0161a Jankovi\u0107", 
     "Sa\u0161a Jankovi\u0107", 
     "Raycent Edwards", 
     "Robert E Cornegy, Jr"
    ]
    [
     "2380692464", 
     "563692937", 
     "563692937", 
     "15920807", 
     "460051837"
    ]
    [
     542309722385580032, 
     542227367834685440, 
     542202885514461185, 
     542201843448045568, 
     542188061598437376
    ]
    

    请查看at this site 以获取有关如何使用 api 的更多示例。

    【讨论】:

    • 感谢您的宝贵帖子。我已经完成了这部分。我怀疑您正在检索的两个 id,根据我的说法,第二个是推特 ID,推特 ID 应该是 ['user']['id']。但是你能告诉我如何检索与 twitter 用户对应的 facebook 用户 id 和 facebook 用户名吗?因为这是我最关心的问题。
    • 我认为发推文的用户的 fb 帐户被隐藏了。您唯一能做的就是检查推文本身的 fb 链接,然后使用 facebook api 来确定发布帖子的用户(但是,这不一定是原始推文用户)。我不熟悉 facebook api,所以我不知道潜在的限制。
    • 您能否指定您检索到的两个 id 的类型,分别为 user id 和 tweet id 。另外,请您检查一下我在问题中发布的结果中的用户数组吗?
    猜你喜欢
    • 2017-10-14
    • 2013-02-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-10-21
    • 1970-01-01
    • 2020-08-12
    相关资源
    最近更新 更多