【发布时间】:2017-09-09 22:06:57
【问题描述】:
我正在使用 BeautifulSoup 包来抓取网站。
我使用以下代码将我们要查找的内容提取到一个名为 l_results 的变量中
l_results = soup.find_all('div',attrs={"class":"gitb-section-content"})
这将返回以下数据:
[<div class="gitb-section-content" data-section_name="valuable_features">\n<ul>\n<li>Passcode enforcement on devices containing corporate email or data</li>\n<li>The notification of new devices accessing corporate email and VPN connectivity</li>\n<li>Deploying needed applications to device groups</li>\n</ul>\n</div>,
<div class="gitb-section-content" data-section_name="improvements_to_organization">\n<p>The product has given us complete control of devices allowed to receive company data. It is important that only salaried employees receive corporate email on mobile devices. Checking and responding to corporate email outside of normal scheduled shifts by hourly employees, can and should be time paid.</p>\n</div>,
<div class="gitb-section-content" data-section_name="room_for_improvement">\n<p>I would like to see one-click app distribution to a single device or user. Perhaps I need further instruction in this area if it is supposed to function in this way currently. I would also like the ability to add a nagging message to any user that falls out of compliance.</p>\n</div>,
<div class="gitb-section-content" data-section_name="use_of_solution">\n<p>I've used it for three years.</p>\n</div>,
<div class="gitb-section-content" data-section_name="stability_issues">\n<p>It does seem that the more devices we added, the slower the management console operates.</p>\n</div>,
<div class="gitb-section-content" data-section_name="other_advice">\n<p>We are very pleased with the Maas360 product and plan to continue use as our company grows.</p>\n</div>]
现在我正在尝试从“p”和“li”标签中提取文本,因为一些评论可能同时包含段落文本和列表项(最初不知道 li)。
我可以使用以下方法获得那些不包含列表项的结果:
for x in l_results:
review_text += '\n' + ''.join(x.find('p').text)
当代码遇到包含 li 的评论时,我得到以下结果:
File "<ipython-input-63-d24fd128d779>", line 2, in <module>
review_text += '\n' + ''.join(x.find('p').text)
AttributeError: 'NoneType' object has no attribute 'text'
【问题讨论】:
-
有什么问题?
-
如果你想要的只是段落,你可以试试 CSS 选择器
"div.gitb-section-content p" -
谢谢 - 我可以把段落文本拉进来,我如何让列表文本也进来?
-
当代码遇到包含 li 的评论时,结果如下: File "
", line 2, in review_text += '\n' + ''.join(x.find('p').text)AttributeError: 'NoneType' 对象没有属性'text' -
好吧,
x.find('p')什么也没找到。
标签: python beautifulsoup