HTML 解析没有响应答案

【问题标题】：HTML Parsing gives no responseHTML 解析没有响应
【发布时间】：2014-03-18 03:47:35
【问题描述】：

我正在尝试解析网页，这是我的代码：

from bs4 import BeautifulSoup
import urllib2

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
read = BeautifulSoup(openurl.read())
soup = BeautifulSoup(openurl)
x = soup.find('ul', {"class": "i_p0"})
sp = soup.findAll('a href')
for x in sp:
    print x

我真的可以更具体，但正如标题所说，它没有给我任何回应。没有错误，什么都没有。

【问题讨论】：

标签： python html beautifulsoup html-parsing urllib2

【解决方案1】：

首先，省略read = BeautifulSoup(openurl.read())这一行。

另外，x = soup.find('ul', {"class": "i_p0"}) 行实际上没有任何区别，因为您在循环中重用了x 变量。

另外，soup.findAll('a href') 没有找到任何东西。

另外，BeautifulSoup4 中有一个 find_all()，而不是老式的 findAll()。

这里的代码有几个改动：

from bs4 import BeautifulSoup
import urllib2

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
soup = BeautifulSoup(openurl)
sp = soup.find_all('a')
for x in sp:
    print x['href']

这会打印页面上所有链接的href 属性值。

希望对您有所帮助。

【讨论】：

谢谢。看来我现在明白了。

【解决方案2】：

我在您的代码中更改了几行，但确实收到了回复，但不确定这是否是您想要的。

这里：

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
soup = BeautifulSoup(openurl.read()) # This is what you need to use for selecting elements
# soup = BeautifulSoup(openurl) # This is not needed
# x = soup.find('ul', {"class": "i_p0"}) # You don't seem to be making a use of this either
sp = soup.findAll('a')
for x in sp:
    print x.get('href') #This is to get the href

希望这会有所帮助。

【讨论】：