【发布时间】:2020-11-22 13:12:37
【问题描述】:
我正在尝试从页面顶部的该网站https://roobet.com/ 读取一个数字 使用 page=requests.get('https://roobet.com/') 我没有得到那个号码 为什么会发生这种情况,我必须做什么? 我想读的号码叫做“赌注:XXXXXXX” 但是当我使用 requests.get() 时,我看不到这样的事情
PS:当我在网页上使用viewsource时,我仍然没有看到这样的数字或文字。 如何读取和导入该号码?
import requests
page=requests.get("https://roobet.com")
text_page=page.text
print(text_page)
输出:
<!DOCTYPE html>\n<html lang="en">\n\n <head>\n <!-- Google Tag Manager -->\n <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\n new Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\n j=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n \'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n })(window,document,\'script\',\'dataLayer\',\'GTM-563FCQS\');</script>\n <!-- End Google Tag Manager -->\n <meta charset="UTF-8">\n <meta name="viewport" content="width=device-width, initial-scale=1">\n <link rel="preconnect" href="https://fonts.googleapis.com/" crossorigin>\n <title>Roobet | Crypto\'s Fastest Growing Casino</title>\n <meta name="description" content="Roobet, crypto\'s fastest growing casino. Hop on in, chat to others and play exciting games - Come and join the fun!">\n <base href="/">\n <meta name="theme-color" content="#191b31" />\n <link rel="icon" type="image/png" href="images/favicon.png">\n <link rel="manifest" href="/manifest.json" />\n <script src="https://cdn.onesignal.com/sdks/OneSignalSDK.js" async ></script>\n <script src="https://maps.googleapis.com/maps/api/js?key=AIzaSyCXI19SE-ZWv_ZyW7gGMzCTf4TGfOA3Sdk&libraries=places"></script>\n <script src="https://tekhou5-dk2.pragmaticplay.net/gs2c/common/js/lobby/GameLib.js" />\n <script>\n var OneSignal = window.OneSignal || [];\n OneSignal.push(function() {\n OneSignal.init({\n appId: "29c72f64-e7e6-408c-99b2-d86a84c6a9cb",\n notifyButton: {\n enable: false,\n autoResubscribe: true,\n },\n welcomeNotification: {\n disable: true\n }\n });\n });\n </script>\n <link href="0.20c4e82d288213005850.css" rel="stylesheet"><link href="app.20c4e82d288213005850.css" rel="stylesheet"></head>\n <body>\n <!-- Google Tag Manager (noscript) -->\n <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-563FCQS"\n height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>\n <!-- End Google Tag Manager (noscript) -->\n <div id="root"></div>\n <div id="modalRoot"></div>\n <div id="loader">\n <div class="loaderLogo">\n <img src="/images/logo.svg" />\n </div>\n </div>\n <script type="text/javascript" src="vendors.bundle.js?v=1272961ec29bf316a891"></script><script type="text/javascript" src="locale.bundle.js?v=f09f53a5cbf99ec0cac6"></script><script type="text/javascript" src="app.bundle.js?v=9f19f2ed821de8c93f9c"></script></body>\n <script>(function(){var w=window;var ic=w.Intercom;if(typeof ic==="function"){ic(\'reattach_activator\');ic(\'update\',intercomSettings);}else{var d=document;var i=function(){i.c(arguments)};i.q=[];i.c=function(args){i.q.push(args)};w.Intercom=i;function l(){var s=d.createElement(\'script\');s.type=\'text/javascript\';s.async=true;s.src=\'https://widget.intercom.io/widget/gcr7bzde\';var x=d.getElementsByTagName(\'script\')[0];x.parentNode.insertBefore(s,x);}if(w.attachEvent){w.attachEvent(\'onload\',l);}else{w.addEventListener(\'load\',l,false);}}})()</script>\n <script src="https://intaggr.softswiss.net/public/sg.js"></script>\n <script type="text/javascript" src="https://www.google.com/recaptcha/api.js?render=6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_"></script>\n <script type="text/javascript">\n if (typeof window.grecaptcha !== \'undefined\') {\n grecaptcha.ready(function() {\n grecaptcha.execute(\'6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_\', {action: \'homepage\'});\n })\n }\n </script>\n</html>\n'
在真实页面中,您可以看到我们确实有赌注和其他东西
【问题讨论】:
-
如果你分享你的代码会更好。
-
没有必要,因为只有 3-4 行,但是已经添加了
-
@mehdi shahidi:这是网站的来源,与浏览器中的来源相同。附加数据通过 JavaScript 加载,请求不执行。
-
那么如何通过python读取JavaScript加载的数据呢? @MauriceMeyer
-
您可以尝试使用 Selenium 来抓取网页。使用 selenium,您可以在读取/抓取数据之前等待数据加载到页面上。
标签: python web-scraping python-requests