【问题标题】:How to get all image urls with urllib.request.urlopen from multiple urls如何从多个 url 获取带有 urllib.request.urlopen 的所有图像 url
【发布时间】:2020-03-01 12:08:17
【问题描述】:
from bs4 import BeautifulSoup
import urllib.request

urls = [
"https://archillect.com/1",
"https://archillect.com/2",
"https://archillect.com/3",
]

soup = BeautifulSoup(urllib.request.urlopen(urls))

for u in urls:
   for img in soup.find_all("img", src=True):
    print(img["src"])

AttributeError: 'list' 对象没有属性 'timeout'

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    @krishna 给了你答案。我再给你一个解决方案,仅供参考。

    from simplified_scrapy import Spider, SimplifiedDoc, SimplifiedMain, utils
    class ImageSpider(Spider):
      name = 'archillect'
      start_urls = ["https://archillect.com/1","https://archillect.com/2","https://archillect.com/3"]
      def afterResponse(self, response, url, error=None, extra=None):
        try:
          # Create file name
          end = url.find('?') if url.find('?')>0 else len(url)
          name = 'data'+url[url.rindex('/',0,end):end]
          # save image
          if utils.saveResponseAsFile(response,name,'image'):
            return None 
          else:
            return Spider.afterResponse(self, response, url, error)
        except Exception as err:
          print (err)
      def extract(self,url,html,models,modelNames):
        doc = SimplifiedDoc(html)
        urls = doc.listImg(url=url.url)
        return {'Urls':urls} 
    SimplifiedMain.startThread(ImageSpider()) # Start
    

    这里有更多示例:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/spider_examples

    【讨论】:

      【解决方案2】:

      你不能传递 URL 列表。

      for url in urls:
         soup = BeautifulSoup(urllib.request.urlopen(url))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-10-29
        • 1970-01-01
        • 2014-01-24
        • 2019-01-02
        • 2021-04-26
        相关资源
        最近更新 更多