【发布时间】:2020-04-02 01:59:08
【问题描述】:
我正在尝试用 python 抓取这张表 (https://futures.tradingcharts.com/marketquotes/ZC.html)。我已经尝试过基于这个post 的东西,但是当我手动检查网站的来源时,我没有看到表格。我如何抓取这张表?
<div class="mq_page_wrapper">
<script type="text/javascript">
$(document).ready(function(){
generateTCPSLink();
});
function generateTCPSLink(){
var root = location.protocol + '//' + location.host;
var url_param = {
action:'tcps_logged_in',
timestamp: (new Date()).getTime()
};
$.getJSON(root+'/widgets/footer_ajax/footer_common_functions.php?'+$.param(url_param),function(data){
if(data.logged_in){
$('span#tcps_link').html("Logout:<br> <a href='"+root+"/premium_subscriber/tcps_logout.php"+"'>Premium Subscriber</a><br>");
}else{
$('span#tcps_link').html("Login:<br> <a href='"+root+"/premium_subscriber/login_subscribe.php?premium_link"+"'>Premium Subscriber</a><br>");
}
});
}
</script>
<div id="members_classic">
<span id="tcps_link"></span>
</div>
from selenium import webdriver
import time
import os
from bs4 import BeautifulSoup
chrome_path = r"C:\Users\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://futures.tradingcharts.com/marketquotes/ZC.html')
time.sleep(80)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
print soup
【问题讨论】:
-
在加载页面时打开浏览器的检查器并观看“网络”选项卡。您将看到该表实际上是使用来自另一个 URL (
getQuote.json) 的数据生成的。您将需要发布您应该能够通过检查请求标头找到的相同 API 密钥。
标签: python selenium web-scraping html-table