如何将美丽的汤输出数据保存到文本文件中？答案

【问题标题】：How do i get my Beautiful soup output data to a text file?如何将美丽的汤输出数据保存到文本文件中？
【发布时间】：2016-04-22 09:32:38
【问题描述】：

如何将美丽的汤输出数据保存到文本文件中？
这是代码；

import urllib2

from bs4 import BeautifulSoup

url = urllib2.urlopen("http://link").read()

soup = BeautifulSoup(url)

file = open("parseddata.txt", "wb")

for line in soup.find_all('a', attrs={'class': 'book-title-link'}):

 print (line.get('href'))

 file.write(line.get('href'))

 file.flush()

 file.close()

【问题讨论】：

file.flush() 和 file.close() 应该在 for 循环之外。

标签： python python-2.7 file-io beautifulsoup

【解决方案1】：

file.close 应该被调用一次（在for 循环之后）：

import urllib2
from bs4 import BeautifulSoup

url = urllib2.urlopen("http://link").read()
soup = BeautifulSoup(url)
file = open("parseddata.txt", "wb")
for line in soup.find_all('a', attrs={'class': 'book-title-link'}):
    href = line.get('href')
    print href
    if href:
        file.write(href + '\n')
file.close()

更新您可以使用href=True 来避免if 语句。除此之外，使用with statement，无需手动关闭文件对象：

import urllib2
from bs4 import BeautifulSoup


content = urllib2.urlopen("http://link").read()
soup = BeautifulSoup(content)

with open('parseddata.txt', 'wb') as f:
    for a in soup.find_all('a', attrs={'class': 'book-title-link'}, href=True):
        print a['href']
        f.write(a['href'] + '\n')

【讨论】：

你不能添加 None 和一个字符串，如果 .get 返回 None 最好完全忽略它
@PadraicCunningham，我根据您的评论更新了答案。谢谢。
不用担心，你也可以设置 href=True

【解决方案2】：

我只是这样做：

with open('./output/' + filename + '.html', 'w+') as f:
    f.write(temp.prettify("utf-8"))

temp是beautifulsoup评价的html。

【讨论】：