【发布时间】:2021-10-15 17:43:25
【问题描述】:
我尝试使用 selenium 打开这个网站:https://www.landnsea.net/ (在普通模式和隐身模式下都可以在 chrome 中手动打开)
这是我正在使用的代码
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
HEADERS = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
link = "https://www.landnsea.net"
options = Options()
# options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link)
当我运行程序时,网站会打开,但我没有进入登录屏幕。
为什么无法使用 selenium 打开此页面?
我认为这是检查完成的部分 - 我用 BeautifulSoup 读出了这个:
<div class="cf-browser-verification cf-im-under-attack">
<noscript>
<h1 data-translate="turn_on_js" style="color:#bd2426;">
Please turn JavaScript on and reload the page.
</h1>
</noscript>
<div id="cf-content" style="display: block;">
<div id="cf-bubbles">
<div class="bubbles">
</div>
<div class="bubbles">
</div>
<div class="bubbles">
</div>
</div>
<h1>
<span data-translate="checking_browser">
Checking your browser before accessing
</span>
www.landnsea.net.
</h1>
<div class="cookie-warning" data-translate="turn_on_cookies" id="no-cookie-warning" style="display:none">
<p data-translate="turn_on_cookies" style="color:#bd2426;">
Please enable Cookies and reload the page.
</p>
</div>
<p data-translate="process_is_automatic">
This process is automatic. Your browser will redirect to your requested content shortly.
</p>
<p data-translate="allow_5_secs" id="cf-spinner-allow-5-secs">
Please allow up to 5 seconds…
</p>
<p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">
Redirecting…
</p>
</div>
<form action="/mov/login.htm?__cf_chl_jschl_tk__=pmd_HaWZ4vSGjaTyyRdgUT6W6hapfCN9Am.Q_fYMeTph8wk-1634374280-0-gqNtZGzNAjujcnBszQgR" class="challenge-form" enctype="application/x-www-form-urlencoded" id="challenge-form" method="POST">
<input name="md" type="hidden" value="ckruH4cMo_91RT4vrBFz9ds90Pw_fBJMradIMGbBHf4-1634374280-0-Ae1p781nAPfJ-_E0d7q6_cdATkWhuty8I-_BWQk6bL49Qe_TXshaNWVJdiH8ro4tHwuGXFzSpFMbVcgTEuDaz7DeIgcRTKixvmgqJ__B6tp6osrP35Om_XkDbaFXaXHrV4x-28tuaE9lw-l9EczBeeGXyVl28Xgr12KlM7MRyp9EJ3KMdtpd3dbqajToAwj3F5LeINMtQxmiypDYtZ-hbZCY11RmpwWgy7HwPMUe8Hmpcp4vKaVnYTNUwDVYM76VOD-svmIknjwHW0f1VuiWLXtiI5iv1_2l6Q24kEjyLR3vFh3yRJ3puc4Oo8NwAi3EhSNJvlRUvdGXaVf8ZbGImhyPpy1fK3G6r_MwNY5mMoZVdMtj73ORB7Wnb1AvqupxhneevgfwlDo7RXUy7yaReKx4dJS4TCiI4UReE6sc6G9nlxg9LINDHLPlHOMq4s2i1Ek8taAx9mbpavOcpdvAZW2q3BGG3KZ8O5BNnUKW-XFojaTc2juvwuiBdvRqvFJCKA"/>
<input name="r" type="hidden" value="_d2eLttCt_xKBhyiWnBQTHdk8jVLaIQHb64tfgWg4oo-1634374280-0-AYcTeG8eRlI1lkqma+p5SPlg7Y04MFtDT3g3Vk0kt1rf801LccakPg6Agh/ah3dgTOIL89svU3GpVhR49fIP91tlnp0JYNTO8MgkQqOZCw7AnYhYYH04Z+TgGQJkRwDmqmcUx6NeZc268cCm0pdahm6x+8U6QeN2Vviuv7hlzXrD2CiSSJmUrxnGf997M4/0LyAO6HJofDx/ptKc85+F//4urOd/uE0zMxTGRNSUAlMFnwHB27fMGs6Gu6h9LMhyPmUHJSupssvnSjQfn6afSv89xjbIXKpJuFYwe4u2gTMSTOrnXxWz+KoHIGTPlQGujw6XaRWQ8K/FHJzkLOULBeeTjH4nGGUl2y2r2gNnUUa7GHSJEUHTsM32UV8qZYlNn+HJY/CzK10dnx4JwuMEJfAjzKkJNqxCBEpmMKl8sK5u1q2rgrIBYhU9O1cmPFfDytAx4qDU6PYg4DaPI7A4BkMoX7S6B/PO6A1+z4w48fWt2dG2HPFOzRtP89wjr8abSd/GsoQB2S7U9HbzP/6EvRY4ndXfRm30/9MVbbUQNtWxmBskC88yjzThr06WMMKbJxVoOKefEi+LsL/qN3JKey6ZPohQRbakiTPeNuiGf0umKTGQ2ydu93ImnYjIm9ETNLoFvaLSEXuYWarxCrMIXN4u8kjjJU3IhPNIicQ+jj8M43ZP6GMV6xq0h1U7/5ainpmMFUhlq3jk8Uipbo5cgqzN8YlDviCZl5oaHiJb02LSY/m7xOzLweHFlxS3CdAN899LGVK4Q0FT/St9Q3IZeOE6D1deMzZzXvpTtx0iXvR8mQlZijCF8Ub6cxwJ2J5p6r7bNoqSp3HwX5h3DZmGifzt6mkoHRTzxAKj1E7i//LNOcte0NOvgOc95wS39uksjeT6608WKFzgMZ1MUCiPyiE2cjoryQcJzUssPtOaddmKvnsUNDin9FmSgzp88Q3vSbXSEKuJzH+DyUlyYu5dupEyRCTrCJXov4WPXcH2KjE5EaO//RUglGK69BHOOVb4LAkLArjSiixplrGOzSbBu7VFMiuIk9JxgbSy/HjbnfPSWEHtfDXFWYGa/9m6X/XBnO7lAA59dE4FKnekFqdTreNVwwecsUVlCs2EuLWGz/x5JU2sjkH0aegr/OBv3bVtPNVtlqOCzlNH6JHwDEhZ2ElUd6QvsrlgklSHQ9xi6FjkIuDqevVaNrnwBlrjrP31FBAHT8eNBJvE1JZBSQp8JvqiZD/2K5nen/ClO7VgdMpVJ4QCX9qkbfCDPkuZcspoXNLF5wpM9NfbxCwuY/S/BTso/PDz8tN/bifi49rUdx3+8jkEtogS0QdJ1/2rQm1uf7+8f4ifudosXw9sYO/mSdIHHLna0mFMDSHlvoExsK29pN2sP9V4aMtjtnN48cLz9e1+fXJzdzOOXhkoK12b32dD/j606+sHDS9xXh1aTn8NEF7HRnU8wwMtnbxXMkMFd0JkS613GDLLW3lOUxIzyZJCqa2iKa3ebBaZjxsteanil+hABzzhGvpPHDY2Y1ez1jfgcH5d4QMwC9/U6VgwnV2Id1HQxNk1rpWyKPB6YcOPahomylRshARuqbheKPLZNL/DqF0U/nRPz/3+XSjd2Wa6bYRbF+yEsBlVAEcMVE+2lk9ZPPVE5LWoEcxflUh6Ugy/QxzqK5rPNKblB+7op6fpPKyavnFdqEUJf2P7rI6seppX8fEr7ONZ+09+f69auYF3MOe+jDl16bX1oycE+cXQ81mH2mvJX8J+uD6V2Lxm0c57Zd2JN8mgyKIAdTrWuF7PcGm1snve9iGhzPSP3j/r4kudM1C99oqLnUpWNjv+fRFb9Yc53Z/mQgARk+dPlDAEHR+WDw4L3D6U9CHX8trhN2URU3mdNDtUH4Gj2tv78IJ261v7pBKGhZPr696M19IsodlwaNemCfmft+evjbpLFb8A/yEPTSPX1vDYq4WEtwlmpjT7Tvl0VnwoE5EPbEYpIiCRTUeGJS9xjhmoVXQ="/>
<input id="jschl-vc" name="jschl_vc" type="hidden" value="fe2d1755e23f31387c92b3046378a830"/>
<!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
<input name="pass" type="hidden" value="1634374281.028-mQ3i5gRSbj"/>
<input id="jschl-answer" name="jschl_answer" type="hidden"/>
<span style="display:none">
error code: 1020
</span>
</form>
<script type="text/javascript">
//<![CDATA[
(function(){
var a = document.getElementById('cf-content');
a.style.display = 'block';
var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
var trkjs = isIE ? new Image() : document.createElement('img');
trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js/transparent.gif?ray=69f00b722f601a30");
trkjs.id = "trk_jschal_js";
trkjs.setAttribute("alt", "");
document.body.appendChild(trkjs);
var cpo=document.createElement('script');
cpo.type='text/javascript';
cpo.src="/cdn-cgi/challenge-platform/h/b/orchestrate/jsch/v1?ray=69f00b722f601a30";
document.getElementsByTagName('head')[0].appendChild(cpo);
}());
//]]>
</script>
<div id="trk_jschal_nojs" style="background-image:url('/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=69f00b722f601a30')">
</div>
</div>
更新 - 尝试了以下 cmets 的一些解决方案
(1) 在 selenium 启动之前创建一个新的 chrome-profile 并登录 (不工作 - 结果与以前相同)
options = Options()
options.add_argument(r"--user-data-dir=C:\Users\Polzi\AppData\Local\Google\Chrome\User Data\Profile1") #e.g. C:\Users\You\AppData\Local\Google\Chrome\User Data
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver'
elif platform == "darwin": cd = '/chromedriver'
srv=Service(path + cd)
driver = webdriver.Chrome (service=srv, options=options)
driver.get (link)
time.sleep (60)
(2) 尝试了 undetected-chromdriver 模块 (不工作 - 结果与以前相同)
import undetected_chromedriver.v2 as uc
driver = uc.Chrome()
driver.get (link)
【问题讨论】:
-
网站在实际打开登录页面之前有一个检查页面。这可能是问题所在。
-
那么用 selenium 实现这个自动化是不可能的吗?我有凭据,但是如果我可以输入它们,我就无法进入该页面...(这也让我很恼火,它仍然可以在隐身模式下正常工作 - 通常当出现一些问题时,它们也处于隐身模式铬)
-
我认为网站可以检测到驱动程序并且不会让您进入。我会尝试某种隐藏模式
-
“隐藏模式”是什么意思?
-
此外,手动执行第一个登录步骤也没有问题 - 然后执行所有以后的自动化操作。
标签: python selenium web-scraping