【发布时间】:2017-03-21 16:38:05
【问题描述】:
我正在做一个项目,我试图让 lxml 从不同网页上的不同表格中提取股票数据。当我运行我的程序试图打印我试图拉的值时,我得到空括号
('Cash_and_short_term_investments:', [])
('EPSNextYear:', [])
这是我的称呼方式:
#the url at this point is http://finviz.com/quote.ashx?t=RAIL confirmed with print statement
url = driver.current_url
page2 = requests.get(url)
tree2 = html.fromstring(page2.content)
EPSNextYear =
tree2.xpath('/html/body/table[3]/tr[1]/td/table/tr[7]/td/table/tr[2]/td[6]/b')
#Original XPath:/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[2]/td[6]/b
print ('EPSNextYear:', EPSNextYear)
和:
#the url at this point is https://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA I've confirmed this with a print
url = driver.current_url
page3 = requests.get(url)
tree3 = html.fromstring(page3.content)
Cash_and_Short_Term_Investments = tree3.xpath('//*[@id="fs-table"]/tr[3]/td[2]/text()')
print('Cash_and_short_term_investments:', Cash_and_Short_Term_Investments)
我已经从 XPath 中删除了 tbody,就像一些类似的问题所建议的那样。任何帮助或建议将不胜感激,谢谢!
【问题讨论】:
标签: python xpath web-scraping lxml