urllib2 在 python 中不返回任何内容答案

【问题标题】：urllib2 returning nothing in pythonurllib2 在 python 中不返回任何内容
【发布时间】：2014-05-16 12:50:46
【问题描述】：

我很困惑！！！谁能告诉我问题出在哪里？？？这段代码过去可以正常工作，但从昨天开始就没有返回任何东西！我没有对其进行任何更改！有人知道吗？？？

import re
from re import sub
import time
import cookielib
from cookielib import CookieJar
import urllib2
from urllib2 import urlopen
import difflib
import requests


def twitParser():

        try:
            cj = CookieJar()            
            opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
            res=opener.open('https://twitter.com/haberturk')
            html=res.read()

            splitSource=re.findall(r'<p class="js-tweet-text tweet-text">(.*?)</p>',html)
            print len(splitSource)

            for item in splitSource:
                aTweet = re.sub(r'<.*?>','',item)
                print aTweet

            except Exception, e:
                print str(e)
                print 'ERROR IN MAIN TRY'



    twitParser()

【问题讨论】：

不要使用正则表达式解析 HTML。请参阅stackoverflow.com/questions/1732348/…（另外，Twitter 有一个 API。不要截屏。）
另外，你在 python 缩进中混合了制表符和空格，这是一个很大的问题，可能会导致错误。
这会导致问题吗？？？？在哪里？？？？？？

标签： python parsing urllib2

【解决方案1】：

如果您的代码没有改变，那么可能是其他东西做了：

这个标签已经不存在了：

<p class="js-tweet-text tweet-text">

取而代之的是：

ProfileTweet-text js-tweet-text u-dir

虽然可以使用正则表达式得到你想要的，但不要使用它，而是使用 xml 解析器：

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
ptags = soup.find_all("p")
texts = [p.text for p in ptags if "js-tweet-text" in p["class"]]

适当地拆分功能，首先确保您获得 html，然后如果您找到 p 标签，然后如果您找到符合您条件的任何标签。

正如 Wooble 所说，改用 twitter api，这些公司提供它，这样您就不必浪费资源。

【讨论】：

谢谢。 “首先确保你得到了 html，”我认为问题就在这里。我只是发了推文然后运行我的代码。我有很多 html 标签，但我的推文不在它们之间，所以我认为我在这里犯了一个错误，我想知道我的代码不再工作的改变是什么！请问哪个twitter apt返回推文？我搜索了它，它给了我 5-6 个 api！我应该使用哪一个？？？
我建议使用 python-twitter (pip install python-twitter) 您必须设置一个 twitter 帐户，然后按照以下说明操作：twitter api oauth 和 python-twitter lib
谢谢，我会试试的，我希望它有效:) 我会在这里告诉你

【解决方案2】：

感谢所有回答我的恶魔 :) 我改变了这一行：

    splitSource=re.findall(r'<p class="js-tweet-text tweet-text">(.*?)</p>',html)

到

    splitSource=re.findall(r'dir="ltr">(.*?)</p>',sourceCode)

效果很好:)

【讨论】：