【问题标题】:trying to get the url from image search试图从图像搜索中获取 url
【发布时间】:2021-05-23 12:50:22
【问题描述】:

谁能帮我解决这个代码!所以我想做的是制作一个程序,如果你输入一个单词,它会找到第一张图片并从img发回url,但它没有这样做。

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

word = input()

html = urlopen('https://www.google.com/search?q=', word +'&rlz=1C1GCEU_lvLV926LV926&sxsrf=ALeKk01xl0HutDOTshkCUPM5qDFtKyvuKg:1613851219348&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjC0JiloPnuAhWoAxAIHZKdAGUQ_AUoAXoECA4QAw&biw=958&bih=959')

bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img', {'src':re.compile('.jpg')})
for image in images: 
    print(image['src']+'\n')

谁能解释我该怎么做

【问题讨论】:

  • 轻微备注,您可以将您的地址简化为"https://www.google.com/search?tbm=isch&q=" + word。因为它(可能)会暴露您的个人数据并且是多余的。

标签: python image url beautifulsoup


【解决方案1】:

首先,您没有正确设置请求。您需要定义一个用户代理,否则您的请求将被拒绝。然后,您需要过滤图像。由于 Google 使用的是“gstatic.com”,因此您需要过滤掉响应。

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import re

word = input()

url = "https://www.google.com/search?tbm=isch&q=" + word
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = Request(url, headers=headers)

page = urlopen(req)

bs = BeautifulSoup(page, 'html.parser')
images = bs.find_all('img', {'src':re.compile('.*gstatic.com.*')})

for img in images:
    print(img['src'])

【讨论】:

    【解决方案2】:

    看起来有些图像已编码,但试试这个。如果图像经过编码,您可能在 src 或 href 中的任何位置都找不到 .jpg。

    url = 'https://www.google.com/search?q=guitar'
    page = requests.get(url)
    soup = BeautifulSoup(page.text, "html.parser")
    images = soup.find_all(href=re.compile('.jpg'))
    for image in images: 
        print(image.get('href'))
    

    它会拉出一些图片网址:

    https://www.google.com/imgres?imgurl=https://cdn.mos.cms.futurecdn.net/Ge25ccbyKQ76Et9bBjFnxk-1200-80.jpg&imgrefurl=https://www.guitarworld.com/gear/types-of-guitar-everything-you-need-to-know&h=675&w=1200&tbnid=1bWm5qMm6P85iM&q=guitar&tbnh=84&tbnw=150&usg=AI4_-kR-ixXbUq1jFtJ-kcukVj6j-7KgTw&vet=1&docid=4ZL7MkOS7tG24M&sa=X&ved=2ahUKEwi0qaL_pvnuAhUCXK0KHYLrCWUQ9QEwJHoECAEQCA
    https://www.google.com/imgres?imgurl=https://online.berklee.edu/takenote/wp-content/uploads/2020/07/learn_acoustic_blues_guitar_article_image.jpg&imgrefurl=https://online.berklee.edu/takenote/acoustic-blues-guitar-tips/&h=1200&w=1920&tbnid=QR9aabuUf_XeFM&q=guitar&tbnh=94&tbnw=150&usg=AI4_-kSKaX2goL8QU_gf6aNPMvEK3WF3tw&vet=1&docid=hdq2fzc2ogCnkM&sa=X&ved=2ahUKEwi0qaL_pvnuAhUCXK0KHYLrCWUQ9QEwJXoECAEQCg
    https://www.google.com/imgres?imgurl=https://images-na.ssl-images-amazon.com/images/I/41jIw1mUV4L._AC_.jpg&imgrefurl=https://www.amazon.com/Yamaha-FG800-Solid-Acoustic-Guitar/dp/B01C92QHLC&h=500&w=204&tbnid=ESB5AJN1MKnK_M&q=guitar&tbnh=130&tbnw=53&usg=AI4_-kQB83ftunCPyX3cXobwJMp0b1UhAg&vet=1&docid=9Ld6uZPysxav6M&sa=X&ved=2ahUKEwi0qaL_pvnuAhUCXK0KHYLrCWUQ9QEwJnoECAEQDA
    

    【讨论】:

      猜你喜欢
      • 2012-08-05
      • 1970-01-01
      • 2011-09-26
      • 2015-01-29
      • 1970-01-01
      • 2015-06-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多