【问题标题】:How do I load balance phantomjs using docker-compose and haproxy?如何使用 docker-compose 和 haproxy 对 phantomjs 进行负载平衡?
【发布时间】:2016-11-25 09:20:12
【问题描述】:

我有一个使用 selenium webdriver 与 PhantomJS 交互的应用程序。为了扩大规模,我想运行多个 PhantomJS 实例并使用 haproxy 对它们进行负载平衡。这是针对本地应用程序的,所以我不关心部署到生产环境或类似的东西。

这是我的docker-compose.yml 文件:

version: '2'
services:
  app:
    build: .
    volumes:
      - .:/code
    links:
      - mongo
      - haproxy
  mongo:
    image: mongo
  phantomjs1:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  phantomjs2:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  phantomjs3:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  phantomjs4:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  haproxy:
    image: haproxy
    volumes:
      - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    ports:
      - 8910:8910
    links:
      - phantomjs1
      - phantomjs2
      - phantomjs3
      - phantomjs4

如您所见,我有四个 phantomjs 实例、一个 haproxy 实例和一个应用程序(用 python 编写)。

这是我的haproxy.cfg

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    maxconn 4096
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    option redispatch
    maxconn 2000
    timeout connect 5000
    timeout client 50000
    timeout server 50000

frontend phantomjs_front
   bind *:8910
   stats uri /haproxy?stats
   default_backend phantomjs_back

backend phantomjs_back
   balance roundrobin
   server phantomjs1 phantomjs1:8910 check
   server phantomjs2 phantomjs2:8910 check
   server phantomjs3 phantomjs3:8910 check
   server phantomjs4 phantomjs4:8910 check

我知道我需要使用粘性会话或 haproxy 中的其他东西才能使其工作,但我不知道该怎么做。

这是连接到此服务的我的 python 应用程序代码的相关 sn-p:

def get_page(url):
    driver = webdriver.Remote(
        command_executor='http://haproxy:8910',
        desired_capabilities=DesiredCapabilities.PHANTOMJS
    )

    driver.get(url)
    source = driver.page_source
    driver.close()

    return source

我尝试运行此代码时遇到的错误是:

phantomjs2_1  | [ERROR - 2016-07-12T23:35:25.454Z] RouterReqHand - _handle.error - {"name":"Variable Resource Not Found","message":"{\"headers\":{\"Accept\":\"application/json\",\"Accept-Encoding\":\"identity\",\"Connection\":\"close\",\"Content-Length\":\"96\",\"Content-Type\":\"application/json;charset=UTF-8\",\"Host\":\"172.19.0.7:8910\",\"User-Agent\":\"Python-urllib/3.5\"},\"httpVersion\":\"1.1\",\"method\":\"POST\",\"post\":\"{\\\"url\\\": \\\"\\\\\\\"http://www.REDACTED.com\\\\\\\"\\\", \\\"sessionId\\\": \\\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\\\"}\",\"url\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"urlParsed\":{\"anchor\":\"\",\"query\":\"\",\"file\":\"url\",\"directory\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/\",\"path\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"relative\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"port\":\"\",\"host\":\"\",\"password\":\"\",\"user\":\"\",\"userInfo\":\"\",\"authority\":\"\",\"protocol\":\"\",\"source\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"queryKey\":{},\"chunks\":[\"session\",\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\",\"url\"]}}","line":80,"sourceURL":"phantomjs://code/router_request_handler.js","stack":"_handle@phantomjs://code/router_request_handler.js:80:82"}
phantomjs2_1  | 
phantomjs2_1  |   phantomjs://platform/console++.js:263 in error
app_1         | Traceback (most recent call last):
app_1         |   File "selenium_process.py", line 69, in <module>
app_1         |     main()
app_1         |   File "selenium_process.py", line 61, in main
app_1         |     source = get_page(args.url)
app_1         |   File "selenium_process.py", line 52, in get_page
app_1         |     driver.get(url)
app_1         |   File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
app_1         |     self.execute(Command.GET, {'url': url})
app_1         |   File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
app_1         |     self.error_handler.check_response(response)
app_1         |   File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response
app_1         |     raise exception_class(value)
app_1         | selenium.common.exceptions.WebDriverException: Message: Variable Resource Not Found - {"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"172.19.0.7:8910","User-Agent":"Python-urllib/3.5"},"httpVersion":"1.1","method":"POST","post":"{\"url\": \"\\\"http://www.REDACTED.com\\\"\", \"sessionId\": \"4eff6a60-4889-11e6-b4ad-095b9e1284ce\"}","url":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","urlParsed":{"anchor":"","query":"","file":"url","directory":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/","path":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","relative":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","queryKey":{},"chunks":["session","4eff6a60-4889-11e6-b4ad-095b9e1284ce","url"]}}
app_1         |

那么,如何让负载平衡发挥作用?我错过了什么?

更新

我发现我需要在 haproxy 中进行某种会话管理。 selenium webdriver 和 phantomjs 通过会话进行通信。客户端发送POST /session 并在正文中接收带有会话ID 的回复。该回复看起来像这样:

{"sessionId":"5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6","status":0,"value":{"browserName":"phantomjs","version":"2.1.1","driverName":"ghostdriver","driverVersion":"1.2.0","platform":"linux-unknown-64bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}}

然后,随着会话的进行,会话 id 在后续请求中作为 URI 的一部分发送到服务器,例如 GET /session/5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6/source。如何获取这些东西以将其用于 haproxy 中的粘性会话?

【问题讨论】:

  • FWIW,将会话绑定到请求的 IP 地址不会有用,因为我的应用程序只有一个实例与各种 phantomjs 服务器建立多个连接。
  • 你找到解决办法了吗?
  • 很遗憾,我没有。

标签: python selenium phantomjs docker-compose haproxy


【解决方案1】:

您应该能够在 haproxy 配置本身中添加 cookie..

cookie SERVERID insert indirect nocache
server  httpd1 10.0.0.19:9443 cookie httpd1 check 
server  httpd2 10.0.0.18:9443 cookie httpd2 check 

然后会话将通过 haproxy 本身进行。

【讨论】:

  • 不幸的是,在这种情况下,客户端不尊重 cookie。它不是标准浏览器,它是 Python 中的 selenium webdriver 库。因为 RPC API 不使用 cookie,所以库确实没有理由尊重它们。
  • 我刚刚向 selenium 代码库提交了一张票,以支持其 RPC 代码中的 cookie。这将很容易解决这个问题。 github.com/SeleniumHQ/selenium/issues/2505
猜你喜欢
  • 1970-01-01
  • 2020-09-06
  • 1970-01-01
  • 2023-03-03
  • 2016-03-31
  • 2011-06-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多