无法使用 bs4、python、selenium 抓取细节答案

【问题标题】：Not able to scrape details using bs4, python, selenium无法使用 bs4、python、selenium 抓取细节
【发布时间】：2021-03-08 12:26:14
【问题描述】：

我正在使用下面的代码来打印只是页面源代码的汤变量。

代码

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json, requests, re, sys
from selenium import webdriver
import re, time


yes_url = "https://www.yesbank.in/personal-banking/yes-first/cards/credit-card/yes-first-exclusive-credit-card"

driver = webdriver.Chrome(executable_path="C:\\Users\\Hari\\Downloads\\chromedriver.exe")
driver.get(yes_url)
time.sleep(3)

# r = requests.get(yes_url)


soup = BeautifulSoup(driver.page_source, 'lxml')
print(soup)


driver.close()

我正在抓取页面来源的链接是：https://www.yesbank.in/personal-banking/yes-first/cards/credit-card/yes-first-exclusive-credit-card

运行上述代码后，代码一直运行到几个小时，但我没有得到输出。

请帮助我抓取页面源代码，以便在我运行代码后得到一些输出。

【问题讨论】：

这是我得到的：拒绝访问您没有权限访问此服务器上的“yesbank.in/personal-banking/yes-first/cards/credit-card/…”。
@Rocket Nikita 所以我需要一种方法来绕过它并能够访问页面的源代码。
你能告诉我们你想抓取页面的哪一部分吗？
@bilakos 我想抓取该页面中福利选项卡的详细信息。
@MaredpallyJalebi 好的，我要试一试

标签： python selenium web-scraping beautifulsoup python-requests

【解决方案1】：

问题：您正在处理一个现代网站，该网站检查浏览器本身是否受到控制或没有使用健壮。

如何做到这一点？

只需打开浏览器控制台，然后输入以下内容：

navigator.webdriver

如果是false，那么您的浏览器就不会被任何强大的程序控制，例如selenium。

如果是true，那么它是受控的。

在您的情况下，您必须禁用它以欺骗网站检查机制。

您可以在下面实现您的目标：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup

options = Options()
options.headless = True
options.set_preference("dom.webdriver.enabled", False)
driver = webdriver.Firefox(options=options)
driver.get('https://www.yesbank.in/personal-banking/yes-first/cards/credit-card/yes-first-exclusive-credit-card')


try:
    element = WebDriverWait(driver, 10).until(
        EC.title_contains('YES'))
    soup = BeautifulSoup(driver.page_source, 'lxml')
    print(soup.prettify())
finally:
    driver.quit()

【讨论】：

@αԋɱҽԃ αмєяιcαη 我可以在 chrome 浏览器中做同样的事情吗？
是的！ options.add_argument('--disable-blink-features=AutomationControlled')
@ αԋɱҽԃ αмєяιcαη 能否请您编写相同的代码，我尝试执行但出现错误。