【问题标题】:I am not getting any output for this python code我没有得到这个 python 代码的任何输出
【发布时间】:2020-06-11 18:43:36
【问题描述】:
from bs4 import BeautifulSoup
import requests
import os

url = requests.get("https://www.pexels.com/search/flower/")
soup = BeautifulSoup(url.text, "html.parser")

links = []

x = soup.select('img[src^="https://images.pexels.com/photos"]')

for img in x:
    links.append(img['src'])

for l in links:
    print(l)

【问题讨论】:

  • 你想做什么?
  • 此站点受 cloudflare 保护
  • 我实际上是在尝试从网站上抓取图片

标签: web-scraping beautifulsoup web-crawler


【解决方案1】:

我建议使用selenium Web 驱动程序来获取所有页面源然后解析它们。

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.pexels.com/search/flower/"

options = webdriver.FirefoxOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
options.headless = True
driver = webdriver.Firefox(executable_path="./geckodriver", options=options)

driver.get(url)
content = driver.page_source
driver.quit()

soup = BeautifulSoup(content, "html.parser")
links = []
x = soup.select('img[src^="https://images.pexels.com/photos"]')
for img in x:
    links.append(img['src'])
for l in links:
    print(l)

geckodriver 最新版本here.

我得到以下输出:

https://images.pexels.com/photos/36753/flower-purple-lical-blosso.jpg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/3860667/pexels-photo-3860667.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/133472/pexels-photo-133472.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4618416/pexels-photo-4618416.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4234543/pexels-photo-4234543.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
...
https://images.pexels.com/photos/4492525/pexels-photo-4492525.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4210784/pexels-photo-4210784.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4210781/pexels-photo-4210781.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500

【讨论】:

  • 能否请您解释一下为什么在获取页面源后关闭了浏览器?
  • 不客气!但如果它对你有帮助,你可以投票赞成我的答案:)
猜你喜欢
  • 1970-01-01
  • 2021-09-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-02-07
  • 2023-01-30
  • 2012-12-12
  • 1970-01-01
相关资源
最近更新 更多