【发布时间】:2021-10-04 20:46:07
【问题描述】:
我曾经从网站上抓取标题,但这次我不能这样做,也不知道为什么。
你可以在下面看到我的代码:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import pandas as pd
import ssl
from time import sleep
from random import randint
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
html = urlopen("https://officialblackwallstreet.com/directory/")
bsObj = soup(html.read())
bws_titles_bags = []
bws_names = bsObj.findAll(["a","title data-original-title"])
结果
<img alt="" class="attachment-javo-tiny size-javo-tiny wp-post-image" height="80" sizes="(max-width: 80px) 100vw, 80px" src="https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-80x80.jpg" srcset="https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-80x80.jpg 80w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-150x150.jpg 150w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-300x300.jpg 300w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-768x768.jpg 768w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-1024x1024.jpg 1024w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-600x600.jpg 600w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-250x250.jpg 250w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-132x133.jpg 132w" width="80"> </img></div>
</a>, <a href="https://officialblackwallstreet.com/biz/zmena-inc/">
<div class="img-wrap-shadow">
如何检索,例如标题“McClean Photography”和其他标题?
感谢您的帮助。
【问题讨论】:
-
你能不能稍微扩展一下“我不能再这样做了”?您期待什么结果,实际得到什么结果?
-
如果前端是用 React 构建的,你可能想使用 Selenium 而不是 BS4
-
@MarleneHE 感谢您的回复。我想检索网页“officialblackwallstreet.com/directory”的公司名称。我曾经使用此代码从网站检索数据,但它不适用于该网站。我的结果对于评论部分来说太长了
-
如果您是动态更新的网页抓取网站(即反应、角度或什至没有执行 api 调用的框架的网站),您将不得不通过向其 API 发出 http 请求来直接访问数据(在 chrome 开发工具中嗅探网络)或使用 selenium 模拟浏览器并以这种方式检索您的数据。
-
好的,谢谢大家。我会尝试使用硒:)
标签: python web-scraping