【问题标题】:Selenium is not opening site?Selenium 不开放网站?
【发布时间】:2021-10-15 17:43:25
【问题描述】:

我尝试使用 selenium 打开这个网站:https://www.landnsea.net/ (在普通模式和隐身模式下都可以在 chrome 中手动打开)

这是我正在使用的代码

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

  HEADERS = {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
  }

  link = "https://www.landnsea.net"  
  options = Options()
  # options.add_argument('--headless')
  options.add_experimental_option ('excludeSwitches', ['enable-logging'])
  options.add_argument("start-maximized")
  options.add_argument('window-size=1920x1080')                               
  options.add_argument('--no-sandbox')
  options.add_argument('--disable-gpu')  
  path = os.path.abspath (os.path.dirname (sys.argv[0]))
  if platform == "win32": cd = '/chromedriver.exe'
  elif platform == "linux": cd = '/chromedriver'
  elif platform == "darwin": cd = '/chromedriver'
  driver = webdriver.Chrome (path + cd, options=options)
  driver.get (link)

当我运行程序时,网站会打开,但我没有进入登录屏幕。

为什么无法使用 selenium 打开此页面?

我认为这是检查完成的部分 - 我用 BeautifulSoup 读出了这个:

<div class="cf-browser-verification cf-im-under-attack">
 <noscript>
  <h1 data-translate="turn_on_js" style="color:#bd2426;">
   Please turn JavaScript on and reload the page.
  </h1>
 </noscript>
 <div id="cf-content" style="display: block;">
  <div id="cf-bubbles">
   <div class="bubbles">
   </div>
   <div class="bubbles">
   </div>
   <div class="bubbles">
   </div>
  </div>
  <h1>
   <span data-translate="checking_browser">
    Checking your browser before accessing
   </span>
   www.landnsea.net.
  </h1>
  <div class="cookie-warning" data-translate="turn_on_cookies" id="no-cookie-warning" style="display:none">
   <p data-translate="turn_on_cookies" style="color:#bd2426;">
    Please enable Cookies and reload the page.
   </p>
  </div>
  <p data-translate="process_is_automatic">
   This process is automatic. Your browser will redirect to your requested content shortly.
  </p>
  <p data-translate="allow_5_secs" id="cf-spinner-allow-5-secs">
   Please allow up to 5 seconds…
  </p>
  <p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">
   Redirecting…
  </p>
 </div>
 <form action="/mov/login.htm?__cf_chl_jschl_tk__=pmd_HaWZ4vSGjaTyyRdgUT6W6hapfCN9Am.Q_fYMeTph8wk-1634374280-0-gqNtZGzNAjujcnBszQgR" class="challenge-form" enctype="application/x-www-form-urlencoded" id="challenge-form" method="POST">
  <input name="md" type="hidden" value="ckruH4cMo_91RT4vrBFz9ds90Pw_fBJMradIMGbBHf4-1634374280-0-Ae1p781nAPfJ-_E0d7q6_cdATkWhuty8I-_BWQk6bL49Qe_TXshaNWVJdiH8ro4tHwuGXFzSpFMbVcgTEuDaz7DeIgcRTKixvmgqJ__B6tp6osrP35Om_XkDbaFXaXHrV4x-28tuaE9lw-l9EczBeeGXyVl28Xgr12KlM7MRyp9EJ3KMdtpd3dbqajToAwj3F5LeINMtQxmiypDYtZ-hbZCY11RmpwWgy7HwPMUe8Hmpcp4vKaVnYTNUwDVYM76VOD-svmIknjwHW0f1VuiWLXtiI5iv1_2l6Q24kEjyLR3vFh3yRJ3puc4Oo8NwAi3EhSNJvlRUvdGXaVf8ZbGImhyPpy1fK3G6r_MwNY5mMoZVdMtj73ORB7Wnb1AvqupxhneevgfwlDo7RXUy7yaReKx4dJS4TCiI4UReE6sc6G9nlxg9LINDHLPlHOMq4s2i1Ek8taAx9mbpavOcpdvAZW2q3BGG3KZ8O5BNnUKW-XFojaTc2juvwuiBdvRqvFJCKA"/>      
  <input name="r" type="hidden" value="_d2eLttCt_xKBhyiWnBQTHdk8jVLaIQHb64tfgWg4oo-1634374280-0-AYcTeG8eRlI1lkqma+p5SPlg7Y04MFtDT3g3Vk0kt1rf801LccakPg6Agh/ah3dgTOIL89svU3GpVhR49fIP91tlnp0JYNTO8MgkQqOZCw7AnYhYYH04Z+TgGQJkRwDmqmcUx6NeZc268cCm0pdahm6x+8U6QeN2Vviuv7hlzXrD2CiSSJmUrxnGf997M4/0LyAO6HJofDx/ptKc85+F//4urOd/uE0zMxTGRNSUAlMFnwHB27fMGs6Gu6h9LMhyPmUHJSupssvnSjQfn6afSv89xjbIXKpJuFYwe4u2gTMSTOrnXxWz+KoHIGTPlQGujw6XaRWQ8K/FHJzkLOULBeeTjH4nGGUl2y2r2gNnUUa7GHSJEUHTsM32UV8qZYlNn+HJY/CzK10dnx4JwuMEJfAjzKkJNqxCBEpmMKl8sK5u1q2rgrIBYhU9O1cmPFfDytAx4qDU6PYg4DaPI7A4BkMoX7S6B/PO6A1+z4w48fWt2dG2HPFOzRtP89wjr8abSd/GsoQB2S7U9HbzP/6EvRY4ndXfRm30/9MVbbUQNtWxmBskC88yjzThr06WMMKbJxVoOKefEi+LsL/qN3JKey6ZPohQRbakiTPeNuiGf0umKTGQ2ydu93ImnYjIm9ETNLoFvaLSEXuYWarxCrMIXN4u8kjjJU3IhPNIicQ+jj8M43ZP6GMV6xq0h1U7/5ainpmMFUhlq3jk8Uipbo5cgqzN8YlDviCZl5oaHiJb02LSY/m7xOzLweHFlxS3CdAN899LGVK4Q0FT/St9Q3IZeOE6D1deMzZzXvpTtx0iXvR8mQlZijCF8Ub6cxwJ2J5p6r7bNoqSp3HwX5h3DZmGifzt6mkoHRTzxAKj1E7i//LNOcte0NOvgOc95wS39uksjeT6608WKFzgMZ1MUCiPyiE2cjoryQcJzUssPtOaddmKvnsUNDin9FmSgzp88Q3vSbXSEKuJzH+DyUlyYu5dupEyRCTrCJXov4WPXcH2KjE5EaO//RUglGK69BHOOVb4LAkLArjSiixplrGOzSbBu7VFMiuIk9JxgbSy/HjbnfPSWEHtfDXFWYGa/9m6X/XBnO7lAA59dE4FKnekFqdTreNVwwecsUVlCs2EuLWGz/x5JU2sjkH0aegr/OBv3bVtPNVtlqOCzlNH6JHwDEhZ2ElUd6QvsrlgklSHQ9xi6FjkIuDqevVaNrnwBlrjrP31FBAHT8eNBJvE1JZBSQp8JvqiZD/2K5nen/ClO7VgdMpVJ4QCX9qkbfCDPkuZcspoXNLF5wpM9NfbxCwuY/S/BTso/PDz8tN/bifi49rUdx3+8jkEtogS0QdJ1/2rQm1uf7+8f4ifudosXw9sYO/mSdIHHLna0mFMDSHlvoExsK29pN2sP9V4aMtjtnN48cLz9e1+fXJzdzOOXhkoK12b32dD/j606+sHDS9xXh1aTn8NEF7HRnU8wwMtnbxXMkMFd0JkS613GDLLW3lOUxIzyZJCqa2iKa3ebBaZjxsteanil+hABzzhGvpPHDY2Y1ez1jfgcH5d4QMwC9/U6VgwnV2Id1HQxNk1rpWyKPB6YcOPahomylRshARuqbheKPLZNL/DqF0U/nRPz/3+XSjd2Wa6bYRbF+yEsBlVAEcMVE+2lk9ZPPVE5LWoEcxflUh6Ugy/QxzqK5rPNKblB+7op6fpPKyavnFdqEUJf2P7rI6seppX8fEr7ONZ+09+f69auYF3MOe+jDl16bX1oycE+cXQ81mH2mvJX8J+uD6V2Lxm0c57Zd2JN8mgyKIAdTrWuF7PcGm1snve9iGhzPSP3j/r4kudM1C99oqLnUpWNjv+fRFb9Yc53Z/mQgARk+dPlDAEHR+WDw4L3D6U9CHX8trhN2URU3mdNDtUH4Gj2tv78IJ261v7pBKGhZPr696M19IsodlwaNemCfmft+evjbpLFb8A/yEPTSPX1vDYq4WEtwlmpjT7Tvl0VnwoE5EPbEYpIiCRTUeGJS9xjhmoVXQ="/>
  <input id="jschl-vc" name="jschl_vc" type="hidden" value="fe2d1755e23f31387c92b3046378a830"/>
  <!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
  <input name="pass" type="hidden" value="1634374281.028-mQ3i5gRSbj"/>
  <input id="jschl-answer" name="jschl_answer" type="hidden"/>
  <span style="display:none">
   error code: 1020
  </span>
 </form>

 <script type="text/javascript">
  //<![CDATA[
      (function(){
          var a = document.getElementById('cf-content');
          a.style.display = 'block';
          var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
          var trkjs = isIE ? new Image() : document.createElement('img');
          trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js/transparent.gif?ray=69f00b722f601a30");
          trkjs.id = "trk_jschal_js";
          trkjs.setAttribute("alt", "");
          document.body.appendChild(trkjs);
          var cpo=document.createElement('script');
          cpo.type='text/javascript';
          cpo.src="/cdn-cgi/challenge-platform/h/b/orchestrate/jsch/v1?ray=69f00b722f601a30";
          document.getElementsByTagName('head')[0].appendChild(cpo);
        }());
      //]]>
 </script>
 <div id="trk_jschal_nojs" style="background-image:url('/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=69f00b722f601a30')">
 </div>
</div>

更新 - 尝试了以下 cmets 的一些解决方案

(1) 在 selenium 启动之前创建一个新的 chrome-profile 并登录 (不工作 - 结果与以前相同)

  options = Options()
  options.add_argument(r"--user-data-dir=C:\Users\Polzi\AppData\Local\Google\Chrome\User Data\Profile1") #e.g. C:\Users\You\AppData\Local\Google\Chrome\User Data
  options.add_experimental_option ('excludeSwitches', ['enable-logging'])
  options.add_argument("start-maximized")
  options.add_argument('window-size=1920x1080')                               
  options.add_argument('--no-sandbox')
  options.add_argument('--disable-gpu')  
  path = os.path.abspath (os.path.dirname (sys.argv[0]))
  if platform == "win32": cd = '/chromedriver.exe'
  elif platform == "linux": cd = '/chromedriver'
  elif platform == "darwin": cd = '/chromedriver'
  srv=Service(path + cd)
  driver = webdriver.Chrome (service=srv, options=options)
  driver.get (link)
  time.sleep (60)

(2) 尝试了 undetected-chromdriver 模块 (不工作 - 结果与以前相同)

import undetected_chromedriver.v2 as uc
driver = uc.Chrome()
driver.get (link)

【问题讨论】:

  • 网站在实际打开登录页面之前有一个检查页面。这可能是问题所在。
  • 那么用 selenium 实现这个自动化是不可能的吗?我有凭据,但是如果我可以输入它们,我就无法进入该页面...(这也让我很恼火,它仍然可以在隐身模式下正常工作 - 通常当出现一些问题时,它们也处于隐身模式铬)
  • 我认为网站可以检测到驱动程序并且不会让您进入。我会尝试某种隐藏模式
  • “隐藏模式”是什么意思?
  • 此外,手动执行第一个登录步骤也没有问题 - 然后执行所有以后的自动化操作。

标签: python selenium web-scraping


【解决方案1】:

您可以尝试类似的方法,如果 cloudflare 检查更长,您可以增加等待时间

driver.get(url)
try:
    element = WebDriverWait(driver, 20).until(
        driver.get_element_by_id(your_id)
    )
finally:
    driver.quit()

【讨论】:

  • 好的,谢谢 - 但在这种情况下我应该为 your_id 使用什么 ID?
  • 如果您的元素没有 id,您也可以使用 xpath。只需右键单击用户名或密码文本框,然后单击复制 xpath。而不是 get_element_by_id 你使用 get_element_by_xpath(your_xpath)
  • 这与 id 无关 - 问题是我没有进入可以输入用户和密码的页面
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2015-09-14
  • 2015-10-17
  • 1970-01-01
  • 2015-06-12
  • 2017-12-12
  • 2018-02-24
  • 1970-01-01
相关资源
最近更新 更多