【问题标题】:(Web)socket connection sending headers instead of string(Web)socket 连接发送标头而不是字符串
【发布时间】:2019-12-11 19:29:58
【问题描述】:

我正在开发一个通过 Chrome 扩展程序工作的刮板。它抓取页面上的所有 HTML 并将其发送到过滤和保存数据的 python 代码。我以这种方式进行抓取的原因是该网站具有 Distil Networks,并且“传统”抓取工具被阻止。

我在 2 个代码之间建立了成功的连接,但每当我尝试发送“测试”时。到python服务器它只是输出浏览器的标题。

b'GET / HTTP/1.1 主机:本地主机:18364 连接:升级 Pragma:无缓存 缓存控制:无缓存 用户代理:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36 升级:websocket 来源:chrome-extension://ocplnbpkkcpcomkjioockgnlohhkdeic Sec-WebSocket-版本:13 接受编码:gzip、deflate、br 接受语言:nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7 Sec-WebSocket-Key:SDC7zPgHK/eV+QRSJy0DZQ== Sec-WebSocket-Extensions:permessage-deflate; client_max_window_bits'

JavaScript 代码(客户端):

chrome.runtime.onMessage.addListener(function(request, sender) {
if (request.action == "getSource") {
  var pageAmount = parseInt(request.source, 10)

  var allHTML = ""
  var BaseURL = "https://www.funda.nl/huur/rotterdam/p"

  function encode_utf8(s) {
    return unescape(encodeURIComponent(s));
  }

  var websocket = new WebSocket('ws://localhost:18364');

  websocket.onopen = function () {
    data = encode_utf8('Test.')
    websocket.send('Test.'); 
  };
message.innerText = request.source;
}
});

function onWindowLoad() {

var message = document.querySelector('#message');

chrome.tabs.executeScript(null, {
file: "getPageContent.js"
}, function() {
// If you try and inject into an extensions page or the         webstore/NTP you'll get an error
if (chrome.runtime.lastError) {
  message.innerText = 'There was an error injecting script : \n' + chrome.runtime.lastError.message;
}
});
}

window.onload = onWindowLoad;

Python 代码(服务器):

import socket

LocalSocket = socket.socket()
allHTML = ''

try:  # Connecting the Socket
LocalSocket = socket.socket(socket.AF_INET,     socket.SOCK_STREAM)
LocalSocket.setsockopt(socket.SOL_SOCKET,   socket.SO_REUSEADDR, 1)
LocalSocket.bind(('localhost', 18364))
print("Connected.")
except socket.error as err:
print("ConnectionError: %s" % err)


def main():
LocalSocket.listen(1)

c, addr = LocalSocket.accept()
print('Got connection from', addr)
print(c.recv(1024))

c.close()

if __name__ == "__main__":
main()

【问题讨论】:

    标签: javascript python sockets google-chrome-extension websocket


    【解决方案1】:

    Web 套接字在 HTTP 上分层,因此这是预期的行为。您需要一个 Web 服务器(或说 HTTP 的东西)来处理 Connection: UpgradeUpgrade: websocket 部分,然后在获得支持双向通信的有效连接之前执行其余的握手

    你可以看看使用 websockets 包,它很好地包装了这个

    【讨论】:

      猜你喜欢
      • 2014-07-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-03-24
      相关资源
      最近更新 更多