使用 Tweepy 和 Python 从 Twitter 中提取 1000 个 URI答案

【问题标题】：Extract 1000 URI's from Twitter using Tweepy and Python使用 Tweepy 和 Python 从 Twitter 中提取 1000 个 URI
【发布时间】：2017-02-23 11:06:12
【问题描述】：

我正在尝试使用 Tweepy 和 Python 从 Twitter 中提取 1000 个唯一的、完全扩展的 URI。具体来说，我对将我引导至 Twitter 之外的链接感兴趣（所以不要回到其他推文/转发/重复）。

我编写的代码不断给我一个“实体”的关键错误。

它会在破解之前给我一些网址；有些是扩展的，有些不是。我不知道如何解决这个问题。

请帮帮我！

注意：我遗漏了我的凭据。

这是我的代码：

    # Import the necessary methods from different libraries
      import tweepy
      from tweepy.streaming import StreamListener
      from tweepy import OAuthHandler
      from tweepy import Stream
      import json

    # Variables that contains the user credentials to access Twitter API
      access_token = "enter token here"
      access_token_secret = "enter token here"
      consumer_key = "enter key here"
      consumer_secret = "enter key here"

    # Accessing tweepy API
    # api = tweepy.API(auth)

    # This is a basic listener that just prints received tweets to stdout.
    class StdOutListener(StreamListener):
         def on_data(self, data):
         # resource: http://code.runnable.com/Us9rrMiTWf9bAAW3/how-to-              stream-data-from-twitter-with-tweepy-for-python
    # Twitter returns data in JSON format - we need to decode it first
    decoded = json.loads(data)

    # resource: http://socialmedia-class.org/twittertutorial.html
    # Print each tweet in the stream to the screen
    # Here we set it to stop after getting 1000 tweets.
    # You don't have to set it to stop, but can continue running
    # the Twitter API to collect data for days or even longer.
    count = 1000

    for url in decoded["entities"]["urls"]:
        count -= 1
        print "%s" % url["expanded_url"] + "\r\n\n"
        if count <= 0:
            break

def on_error(self, status):
    print status


if __name__ == '__main__':
     # This handles Twitter authetification and the connection to Twitter     Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)

# This line filter Twitter Streams to capture data by the keyword: YouTube
stream.filter(track=['YouTube'])

【问题讨论】：

首先，永远不要在互联网上分享您的私钥。您的授权凭证现已泄露，您应该重新生成密钥。至于您的问题，很难知道如何解决您的问题，因为我不知道“解码”对象的样子。您应该打印解码的第一项并停止您的脚本。 print(decoded[0]) 检查对象 - 是否有实体属性？
哎呀！不是故意的。谢谢！它的样子是什么意思？

标签： python tweepy

【解决方案1】：

API 似乎达到了速率限制，因此一种选择是在获得KeyError 时包含一个异常，然后我看到[u'limit']。我添加了一个计数显示以验证它确实到达了1000：

count = 1000 # moved outside of class definition to avoid getting reset

class StdOutListener(StreamListener):
    def on_data(self, data):

        decoded = json.loads(data)

        global count # get the count
        if count <= 0:
            import sys
            sys.exit()
        else:
            try:
                for url in decoded["entities"]["urls"]:
                    count -= 1
                    print count,':', "%s" % url["expanded_url"] + "\r\n\n"

            except KeyError:
                print decoded.keys()

    def on_error(self, status):
        print status


if __name__ == '__main__':

    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    stream.filter(track=['YouTube'])

【讨论】：

哇哦！太感谢了。我不知道将它移出类 def 会产生如此大的不同。
非常欢迎。你测试了吗？我希望它有所帮助。我只是将它移出类 def，因为我注意到每次实例化类时它都会重置为 1000。之后它似乎计算正确:)
现在肯定有 1000 个！我试着摆弄它很长时间试图让它工作！你太棒了！现在我只需要弄清楚这些 url 是否完全扩展。
太好了，很高兴听到。它们似乎是，而且应该是，因为您使用的是url["expanded_url"]
啊。由于某种原因，它仍然只给我缩短的网址。