【发布时间】:2016-08-29 13:31:12
【问题描述】:
我曾经使用 Beautifulsoup 解析网页中的数据。但是,当我查看源代码时,我不确定如何从由脚本(JS 和 JSON)填充的网页中收集数据。是否有任何工具可以收集或呈现页面,以便我可以或链接从这些页面收集数据。
我在下面放了一个例如 JSON/JS 源页面的例子。
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" type="text/css" class="__meteor-css__" href="/3688b5ba42be128b061150ae66a2c2f245507d7e.css?meteor_css_resource=true"> <link rel="stylesheet" type="text/css" class="__meteor-css__" href="/4281a8e71152d94a7380f89ab8dd32d9542c9b5c.css?meteor_css_resource=true">
<meta name="fragment" content="!">
<script type="text/inject-data">%7B%22fast-render-data%22%3A%7B%22collectionData%22%3A%7B%22users%22%3A%5B%5B%7B%22emails%22%3A%5B%7B%22address%22%3A%22suhas.servesh%40gmail.com%22%2C%22verified%22%3Afalse%7D%5D%2C%22profile%22%3A%7B%22defaultSiteName%22%3A%22draftkings%22%2C%22defaultSportName%22%3A%22mlb%22%7D%2C%22username%22%3A%22kloudklown%22%2C%22_id%22%3A%22YnZKGMPLrwHCzHRh5%22%7D%5D%5D%2C%22kadira_settings%22%3A%5B%5B%7B%22appId%22%3A%22SiGbMwMEWLf7WK3KB%22%2C%22endpoint%22%3A%22https%3A%2F%2Fenginex.kadira.io%22%2C%22clientEngineSyncDelay%22%3A10000%2C%22enableErrorTracking%22%3Atrue%2C%22_id%22%3A%22SgS4nrWA5a6nDdzaY%22%7D%5D%5D%7D%2C%22subscriptions%22%3A%7B%7D%2C%22loginToken%22%3A%22-cCvsClRaCVlHa24nJLdIjfDp0EOC_flNuR7IR6Qxqj%22%7D%7D</script>
<script type="text/javascript" src="https://js.stripe.com/v2/"></script>
<script type="text/javascript" src="https://checkout.stripe.com/checkout.js"></script>
<link href="https://d1mua5vq38hnzr.cloudfront.net/favicon.ico" rel="icon" type="image/x-icon" />
<script type="text/javascript" src="https://static.leaddyno.com/js"></script>
<!-- Facebook Pixel Code -->
<script>
!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};if(!f._fbq)f._fbq=n;
n.push=n;n.loaded=!0;n.version='2.0';n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window,
document,'script','https://connect.facebook.net/en_US/fbevents.js');
fbq('init', '156814968048022');
fbq('track', "PageView");</script>
<noscript><img height="1" width="1" style="display:none"
src="https://www.facebook.com/tr?id=156814968048022&ev=PageView&noscript=1"
/></noscript>
<!-- End Facebook Pixel Code -->
</head>
<body>
<script type="text/javascript">__meteor_runtime_config__ = JSON.parse(decodeURIComponent("%7B%22meteorRelease%22%3A%22METEOR%401.3.4.1%22%2C%22meteorEnv%22%3A%7B%22NODE_ENV%22%3A%22production%22%2C%22TEST_METADATA%22%3A%22%7B%7D%22%7D%2C%22PUBLIC_SETTINGS%22%3A%7B%22ga%22%3A%7B%22account%22%3A%22UA-58886344-1%22%7D%7D%2C%22ROOT_URL%22%3A%22https%3A%2F%2Fdailyfantasynerd.com%22%2C%22ROOT_URL_PATH_PREFIX%22%3A%22%22%2C%22appId%22%3A%228u0umeqb2znyyvsybl%22%2C%22kadira%22%3A%7B%22appId%22%3A%22SiGbMwMEWLf7WK3KB%22%2C%22endpoint%22%3A%22https%3A%2F%2Fenginex.kadira.io%22%2C%22clientEngineSyncDelay%22%3A10000%2C%22enableErrorTracking%22%3Atrue%7D%2C%22autoupdateVersion%22%3A%22cd1f15509aed34ad130a1b1cc1c46cb282abe1dd%22%2C%22autoupdateVersionRefreshable%22%3A%227a8125062727989a665ebc42d995410c7cc05ab7%22%2C%22autoupdateVersionCordova%22%3A%22none%22%7D"));</script>
<script type="text/javascript" src="/e517e573069a465b017732a35a886ff1c36e2550.js?meteor_js_resource=true"></script>
</body>
</html>
【问题讨论】:
-
您要抓取的页面是什么?
标签: javascript json parsing beautifulsoup