使用 Selenium ChromeDriver 无头模式 python 脚本访问站点时访问被拒绝答案

【问题标题】：Access Denied when accessing the site using Selenium ChromeDriver headess mode python script使用 Selenium ChromeDriver 无头模式 python 脚本访问站点时访问被拒绝
【发布时间】：2018-08-21 04:33:59
【问题描述】：

当我从我的 ubuntu 笔记本电脑上运行时，这段代码运行良好。但是，当我在 AWS EC2 ubuntu 机器上部署它时。我尝试抓取的网站被拒绝访问。我已经多次更改 AWS 机器的 IP，因为它不是 IP 级别的块。

实例化 webdriver 驱动程序的代码：

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1420,1080')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-application-cache')
ua = UserAgent()
userAgent = ua.random
print(userAgent)
chrome_options.add_argument('user-agent={userAgent}')
driver = webdriver.Chrome('/home/ubuntu/chromedriver',chrome_options=chrome_options)
driver.get(link)
print(driver.page_source)

【问题讨论】：

您能否提供您要打开的网站的链接？可能是这个站点在 EC2 所在的区域不起作用。
macys.com 我在西部地区尝试。 curl 在同一台机器上工作。

标签： python amazon-ec2 webdriver selenium-chromedriver google-chrome-headless

【解决方案1】：

有点不清楚在什么情况下，您在尝试抓取网站https://www.macys.com/ 时感觉访问被拒绝。但是，您需要考虑以下几点：

--disable-gpu 仅适用于 windows 操作系统。

作为一个例子，我将特定的user-agent 视为：

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36

执行结果如下：

代码块：

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.macys.com/")
print(driver.page_source)
driver.quit()

控制台输出：

[1217/034634.234:INFO:CONSOLE(2022)] "Error: <svg> attribute viewBox: Expected number, "0 0 135px 40px".", source: https://www.macys.com/ (2022)
[1217/034635.403:INFO:CONSOLE(1)] "2309,2308", source: https://assets.macysassets.com/page/home-page/static/js/home-page.vendors~header.ca1a9a8ca3949327ad99.js (1)
[1217/034636.970:INFO:CONSOLE(0)] "A cookie associated with a cross-site resource at http://demdex.net/ was set without the `SameSite` attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.macys.com/ (0)
[1217/034638.024:INFO:CONSOLE(0)] "A cookie associated with a cross-site resource at http://everesttech.net/ was set without the `SameSite` attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.macys.com/ (0)
<html lang="en"><head class="at-element-marker">
  <title>Macy's - Shop Fashion Clothing &amp; Accessories - Official Site - Macys.com</title>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Cache-Control" content="private, max-age=0, no-cache, must-revalidate">
<meta name="description" content="Macy's - FREE Shipping at Macys.com. Macy's has the latest fashion brands on Women's and Men's Clothing, Accessories, Jewelry, Beauty, Shoes and Home Products.">
<meta name="keywords" content="department store, dept store, department stores, Macys store, clothing, apparel, clothing store, accessories, macy's department store, macys department stores, macys apparel">
  <meta property="og:title" content="Macy’s– Official Site">
  <meta property="og:type" content="website">
  <meta property="og:url" content="https://www.macys.com">
  <meta property="og:description" content="FREE Shipping on the latest fashion brands on Women's and Men's Clothing, Accessories, Jewelry, Beauty, Shoes and Home Products.">
  <meta property="og:image" content="https://www.macys.com/img/nav/co_macysLogo3.gif">
  <meta property="og:site_name" content="Macy's">
  <meta property="fb:app_id" content="172562576126509">
  <meta name="google-site-verification" content="NXerNZgQYWmrno0UECIRSi5eHUACZ-5TThhQOA3SFvU">

  <meta name="viewport" content="width=device-width, initial-scale=1.0">

<link rel="canonical" href="https://www.macys.com/">
    <link rel="preconnect" href="https://assets.macysassets.com">
  <link rel="preconnect" href="https://slimages.macysassets.com">
  <link rel="preconnect" href="https://tags.tiqcdn.com">
  <link rel="preconnect" href="https://libs.coremetrics.com">
  <link rel="preconnect" href="https://dynamic.criteo.com">
  <link rel="preconnect" href="https://rscdn.storetail.net">





      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/carousel-ctrl.7608f93d9c891b06d342.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/mcom.b097920404dcb4038b10.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~canvas.c21198c7217ace6f58cd.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~dynamic-slideshow-ctrl.6ad1fb956323ce6c391a.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~footer.09f550f549f0b44659b0.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~viewCompact~viewMinimalist~viewRadical.a3764e3d32ac27f9bd27.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~viewCompact~viewRadical.425306252d7663c8b168.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~viewFooterResponsive.f1ebc4caa32bc3e086d3.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/viewCompact.35736d9d70390895a793.css">



      <link rel="preload" as="style" href="https://assets.macysassets.com/page/home-page/static/css/viewRadical.58ec252b5ccc3cfcfc40.css">





      <link rel="prefetch" as="style" href="https://assets.macysassets.com/page/home-page/static/css/BrowserVersionMessage.ddfbe993b80a3718f14a.css">

      <link rel="prefetch" as="style" href="https://assets.macysassets.com/page/home-page/static/css/quickBag.e2ec561bf5f55e22791d.css">

      <link rel="prefetch" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~prosFactory.0571235e1704cb0bfff1.css">

      <link rel="prefetch" as="style" href="https://assets.macysassets.com/page/home-page/static/css/vendors~responsive-header.fb72e29bd4f59db62409.css">





  <link rel="preload" as="script" href="https://assets.macysassets.com/page/home-page/static/js/home-page.vendor.common.13df968a79bd2962d068.js">

  <link rel="preload" as="script" href="https://assets.macysassets.com/page/home-page/static/js/home-page.core.vendor.6baca319875c7fae46bc.js">

  <link rel="preload" as="script" href="https://assets.macysassets.com/page/home-page/static/js/home-page.mcom.c5f407ff077934f61bc3.js">







      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/carousel-ctrl.7608f93d9c891b06d342.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/mcom.b097920404dcb4038b10.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/vendors~canvas.c21198c7217ace6f58cd.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/vendors~dynamic-slideshow-ctrl.6ad1fb956323ce6c391a.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/vendors~footer.09f550f549f0b44659b0.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/vendors~viewCompact~viewMinimalist~viewRadical.a3764e3d32ac27f9bd27.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/vendors~viewCompact~viewRadical.425306252d7663c8b168.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/vendors~viewFooterResponsive.f1ebc4caa32bc3e086d3.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/viewCompact.35736d9d70390895a793.css">



      <link rel="stylesheet" href="https://assets.macysassets.com/page/home-page/static/css/viewRadical.58ec252b5ccc3cfcfc40.css">
.
.
.
<iframe sandbox="allow-scripts allow-same-origin" title="Adobe ID Syncing iFrame" id="destination_publishing_iframe_macyscominc_0" src="https://macyscominc.demdex.net/dest5.html?d_nsid=0#https%3A%2F%2Fwww.macys.com%2F" style="display: none; width: 0px; height: 0px;" class="aamIframeLoaded"></iframe><div class="redesign-header-overlay radical" style="display: none;"></div></body></html>

【讨论】：