【发布时间】:2019-12-20 12:07:36
【问题描述】:
我正在尝试抓取网站https://investing.com/ 以获取任何股票的技术数据。我想获得“移动平均线:”和“技术指标:”在不同时期有多少买入和多少卖出:
- 5 小时
- 每天
- 每周
这是查看我想要获取的数据的图像: https://i.ibb.co/mHpM0Yw/Capture-d-e-cran-2019-08-14-a-00-15-45.png
网址是https://investing.com/equities/credit-agricole-technical
当您导航到浏览器时,时间段设置为“每小时”,您必须单击另一个时间段才能获取正确的数据。 DOM 在 XML 请求后更新。
我想在 DOM 更新后抓取页面。
机械化
我尝试使用 Mechanize 进行抓取,然后单击“每周”并让 DOM 抓取它,但出现错误
这是我的代码:
def mechanize_scraper(url)
agent = Mechanize.new
puts agent.user_agent_alias = 'Mac Safari'
page = agent.get(url)
link = page.link_with(text: 'Weekly')
new_page = link.click
end
url = "https://investing.com/equities/credit-agricole-technical"
mechanize_scraper(url)
这是错误:
Mechanize::UnsupportedSchemeError (Mechanize::UnsupportedSchemeError)
当我们检查 DOM 时,链接有一个属性 "href" = javascript(void);
<li pairid="407" data-period="week" class="">
<a href="javascript:void(0);">Weekly</a>
</li>
所以经过一些尝试和大量谷歌搜索后,我继续“Watir”尝试抓取。
女仆
这是我的代码:
def watir_scraper(url)
Watir.default_timeout = 10
browser = Watir::Browser.new
browser.goto(url)
link = browser.link(text: /weekly/).click
pp link
end
url = "https://investing.com/equities/credit-agricole-technical"
watir_scraper(url)
这是错误:
40: 来自 app.rb:47:in `'
39:来自 app.rb:32:in `watir_scraper'
38:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:145:在“点击”中
37:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:789:在`element_call'中
36:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:154:在 `block in click' 中
35:来自 /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/common/element。 rb:74:in `点击'
34:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/ bridge.rb:371:in `click_element'
33:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/ bridge.rb:567:in `执行'
32:来自 /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/bridge。 rb:167:in `执行'
31:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/ common.rb:64:in `call'
30:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/ default.rb:114:in `request'
29:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/ common.rb:88:in `create_response'
28:来自/Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/ common.rb:88:in `new'
27:来自 /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/response。 rb:34:in `初始化'
26:来自 /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/response。 rb:72:in `assert_ok'
25:从 25 libsystem_pthread.dylib 0x00007fff5aaa440d thread_start + 13
24: 从 24 libsystem_pthread.dylib 0x00007fff5aaa8249 _pthread_start + 66
23:从 23 libsystem_pthread.dylib 0x00007fff5aaa52eb _pthread_body + 126
22:来自 22 chromedriver 0x000000010b434e67 chromedriver + 3673703
21:从 21 chromedriver 0x000000010b416014 chromedriver + 3547156
20:从 20 chromedriver 0x000000010b3e0f07 chromedriver + 3329799
19:从 19 chromedriver 0x000000010b3f91b8 chromedriver + 3428792
18: 从 18 chromedriver 0x000000010b3cd069 chromedriver + 3248233
17:从 17 chromedriver 0x000000010b3f86d8 chromedriver + 3426008
16:从 16 chromedriver 0x000000010b3f8940 chromedriver + 3426624
15:从 15 chromedriver 0x000000010b3ecc1f chromedriver + 3378207
14:来自 14 chromedriver 0x000000010b0ce8a5 chromedriver + 108709
13:来自 13 chromedriver 0x000000010b0cd7e2 chromedriver + 104418
12:来自 12 chromedriver 0x000000010b0f1bf3 chromedriver + 252915
11:从 11 chromedriver 0x000000010b0fba37 chromedriver + 293431
10:从 10 chromedriver 0x000000010b0f1c4e chromedriver + 253006
9:来自 9 chromedriver 0x000000010b0cfa66 chromedriver + 113254
8:来自 8 chromedriver 0x000000010b0f1a72 chromedriver + 252530
7:来自 7 chromedriver 0x000000010b0cfe66 chromedriver + 114278
6:来自 6 chromedriver 0x000000010b0d63fb chromedriver + 140283
5:来自 5 chromedriver 0x000000010b0d71a9 chromedriver + 143785
4:来自 4 chromedriver 0x000000010b0d8d19 chromedriver + 150809
3:来自 3 chromedriver 0x000000010b0da569 chromedriver + 157033
2:来自 2 chromedriver 0x000000010b15fcef chromedriver + 703727
1:来自 1 chromedriver 0x000000010b3bf133 chromedriver + 3191091 0x000000010b42f129 chromedriver + 3649833:元素点击被拦截:元素......在点(544、704)处不可点击。其他元素会收到点击:... (Selenium::WebDriver::Error::ElementClickInterceptedError) (会话信息:chrome=76.0.3809.100)
我希望一切都可以帮助您理解我的问题。我想知道我是否可以使用 Mechanize 或 Watir 抓取数据。如果没有,哪些工具可以完成这项工作?
非常感谢!
【问题讨论】:
标签: javascript ruby web-scraping mechanize watir