【发布时间】:2020-09-12 05:41:41
【问题描述】:
我有一个名为“recognized.txt”的文件,其中包含一些类似这样的文本
已识别.txt 的链接:https://drive.google.com/file/d/1yCQz6cQPDmcCOuXBOCAX4nvNoUqewE0y/view?usp=sharing
我的代码:-
f = open('recognized.txt','r')
message = f.read()
message.replace(" ", "")
print(message)
f.close()
import bs4 as bs
import urllib.request
url = ('https://html.duckduckgo.com/html?q='+message) # no javascript
sauce = urllib.request.urlopen(url).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
a = soup.body.b
print(a)
for i in soup.find_all('a', class_='result__snippet'):
print(i.get_text(separator=' - ', strip=True))
所以当我运行上面的代码时,它给了我一个错误:-
Traceback (most recent call last):
File "D:\ocr\webparse.py", line 26, in <module>
sauce = urllib.request.urlopen(url).read()
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 526, in open
response = self._open(req, data)
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 544, in _open
'_open', req)
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 1321, in do_open
r = h.getresponse()
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 1331, in getresponse
response.begin()
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 297, in begin
version, status, reason = self._read_status()
File "C:\Users\Praveen\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 279, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
-
错误是什么意思?
-
为什么会出现这个错误?
【问题讨论】:
标签: python beautifulsoup