【问题标题】:python requests sometimes returns empty listpython请求有时会返回空列表
【发布时间】:2018-01-03 20:00:18
【问题描述】:

所以我一直在尝试从“Drink between 2005 2013”​​中抓取“2005 - 2013” 起初这段代码对我有用,但现在我只返回空列表,但我的请求仍然有 200 个状态代码

import requests, lxml.html, csv
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
page = requests.get('http://www.cellartracker.com/wine.asp?
iWine=91411',headers=headers)
print(page.status_code)
html = lxml.html.fromstring(page.content)
content_divs = html.xpath('//a[@title="Source: Community"]/text()')
print(content_divs)

不确定我是否应该开始使用 selenium 来进行这种抓取,因为它是一个 js 网站,如果是这样,我也不确定如何去做,所以一些基本的帮助会很有用! 谢谢!!

【问题讨论】:

  • 如果是js站点,肯定需要使用Selenium或者类似的工具来抓取
  • 我得到了预期的结果,不知道为什么它会停止工作,你是否试图一遍又一遍地解析同一个网站,有时会得到一个空列表?如果你想参考使用Selenium 进行抓取,我只是answered 一个关于这个问题的问题。

标签: python selenium xpath lxml screen-scraping


【解决方案1】:

使用硒

from selenium import webdriver
url = "https://www.cellartracker.com/wine.asp?iWine=91411"

driver = webdriver.Chrome(executable_path="chromedriver2.25")
driver.get(url)
list = driver.find_elements_by_xpath("//li[contains(.,'review')]")
for item in list:
    print(item.text)
    print("---")

输出:

Options
1/4/2014 - REUBENSHAPCOTT WROTE:
91 Points
Delicious! Had no idea that Australia made port this good, and affordable. Terrific, smooth fig and plum. Aged and neither sharp nor grapey. If you see it, buy it.
Do you find this review helpful? Yes - No / Comment
---
Options
1/20/2013 - LISAADAM WROTE:
85 Points
The wine looks Tawny colored.
Do you find this review helpful? Yes - No / Comment
---
Options
12/22/2012 - WINEAGGREGATE LIKES THIS WINE:
90 Points
Molasses, pepper, butterscotch candy that's been roasted a bit. Very nice.
Do you find this review helpful? Yes - No / Comment
---
Options
10/30/2011 - GTI2TON WROTE:
87 Points
Sweeter than average tawny and straightforward, but still has nice richness in its raisin and light carmel notes. Good value.
Do you find this review helpful? Yes - No / Comment

【讨论】:

    猜你喜欢
    • 2018-12-29
    • 2022-07-05
    • 2020-06-01
    • 1970-01-01
    • 1970-01-01
    • 2013-09-05
    • 2011-01-27
    • 1970-01-01
    • 2015-11-20
    相关资源
    最近更新 更多