【问题标题】:Python download image with lxmlPython用lxml下载图片
【发布时间】:2012-07-18 23:27:27
【问题描述】:

我需要在类似于这个的 HTML 代码中找到一张图片:

...
<a href="/example/1"> 
    <img id="img" src="http://example.net/example.jpg" alt="Example" />
</a>
...

我正在使用 lxml 和请求。

代码如下:

import lxml
from lxml import html
import requests

url = 'http://www.example.com'

r = requests.get(url)
tree = lxml.html.fromstring(r.content)

img = tree.get_element_by_id("img")
f = open("image.jpg",'wb')
f.write(requests.get(img['src']).content)

但我得到一个错误:

Traceback (most recent call last):
  File "/Users/Name/Documents/Python/Example/Script.py", line 13, in <module>
    s = requests.get(img['src'])
  File "/Library/Python/2.6/site-packages/lxml/lxml.etree.pyx", line 1052, in lxml.etree._Element.__getitem__ (src/lxml/lxml.etree.c:38272)
TypeError: 'str' object cannot be interpreted as an index

建议?

【问题讨论】:

  • 建议:阅读文档并修复html。

标签: python web-scraping python-requests lxml


【解决方案1】:
import lxml.html
import requests

url = 'http://www.example.com/'
tree = lxml.html.parse(url)
img = tree.get_element_by_id('img')
img_url = img.attrib['src']

with open('image.jpg', 'wb') as outf:
    data = requests.get(img_url).content
    outf.write(data)

【讨论】:

  • img = tree.get_element_by_id('img') 这次不行,它说: Traceback (last recent call last): File "/Users/Example/Documents/Python/Example/ Script.py",第 6 行,在 img = tree.get_element_by_id('img') AttributeError: 'lxml.etree._ElementTree' object has no attribute 'get_element_by_id' 我试图替换 tree = lxml.html.parse( url) 与 tree = lxml.html.fromstring(requests.get(url).content) 现在它可以工作了,感谢帮助!
【解决方案2】:

试试f.write(requests.get(img.attrib['src']).content)

【讨论】:

    猜你喜欢
    • 2015-08-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-07-24
    • 2014-09-27
    • 1970-01-01
    • 1970-01-01
    • 2011-03-03
    相关资源
    最近更新 更多