【发布时间】:2018-08-12 05:33:24
【问题描述】:
我的 Python 脚本出现了非常奇怪的行为。我正在使用 Python 套接字从网上下载图像。我对使用 requests/urllib 不感兴趣。当我尝试下载图像时,它会成功下载。但是,当要在照片应用程序中打开文件时,Windows 会返回“看起来我们不支持这种文件格式”错误。
这就是奇怪的部分开始的地方。如果我复制并粘贴我的套接字到达的 URL(用于下载图像的 URL,在本例中为 http://www.rit.edu/gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//Abuaitah.jpg)并自己从 Chrome 下载它,然后再次运行我的脚本,图像下载并显示 no问题! HTTP 响应标头中 Content-Length 的数量也会增加。我用 3 张不同的图像做了 3 次,每次都给了我相同的行为。下面是我的脚本的两次运行,一次是在我从 Chrome 下载文件之前,一次是之后。请注意,在第一次运行中,Content-Length 标头指出响应正文中有 2564 个字节。在第二次运行中,这个数字变为 3833。它们都在请求同一个 URL。
PS D:\Documents\School\RIT\Classes\Summer 2018\CSEC 380\Homework\3\Script> python .\hw3-script.py
MESSAGE SENT
GET /gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//xAbuaitah.jpg.pagespeed.ic.PFwk87Pcno.jpg HTTP/1.1
Host: www.rit.edu
Accept: image/webp,image/apng,image/*,*/*;q=0.8
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate
ENTIRE MESSAGE RECEIVED
b'HTTP/1.1 200 OK\r\nDate: Sun, 12 Aug 2018 04:58:24 GMT\r\nServer: Apache\r\nLink: <http://www.rit.edu/gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//Abuaitah.jpg>; rel="canonical"\r\nAccept-Ranges: bytes\r\nLast-Modified: Sun, 12 Aug 2018 02:06:23 GMT\r\nX-Original-Content-Length: 25378\r\nX-Content-Type-Options: nosniff\r\nExpires: Sun, 12 Aug 2018 02:11:23 GMT\r\nCache-Control: max-age=300,private\r\nContent-Length: 2564\r\nConnection: close\r\nContent-Type: image/webp\r\n\r\nRIFF\xfc\t\...<hex data here>...\x00\x00'
RESPONSE HEADERS SPLIT OFF
HTTP/1.1 200 OK
Date: Sun, 12 Aug 2018 04:58:24 GMT
Server: Apache
Link: <http://www.rit.edu/gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//Abuaitah.jpg>; rel="canonical"
Accept-Ranges: bytes
Last-Modified: Sun, 12 Aug 2018 02:06:23 GMT
X-Original-Content-Length: 25378
X-Content-Type-Options: nosniff
Expires: Sun, 12 Aug 2018 02:11:23 GMT
Cache-Control: max-age=300,private
Content-Length: 2564
Connection: close
Content-Type: image/webp
IMAGE BINARY DATA SPLIT OFF
b'RIFF\xfc\t\...<hex data here>...\x00\x00'
Bytes in image data: 2581
PS D:\Documents\School\RIT\Classes\Summer 2018\CSEC 380\Homework\3\Script> python .\hw3-script.py
MESSAGE SENT
GET /gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//xAbuaitah.jpg.pagespeed.ic.PFwk87Pcno.jpg HTTP/1.1
Host: www.rit.edu
Accept: image/webp,image/apng,image/*,*/*;q=0.8
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate
ENTIRE MESSAGE RECEIVED
b'HTTP/1.1 200 OK\r\nDate: Sun, 12 Aug 2018 04:59:08 GMT\r\nServer: Apache\r\nLink: <http://www.rit.edu/gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//Abuaitah.jpg>; rel="canonical"\r\nX-Content-Type-Options: nosniff\r\nAccept-Ranges: bytes\r\nExpires: Mon, 12 Aug 2019 04:58:50 GMT\r\nCache-Control: max-age=31536000\r\nEtag: W/"0"\r\nLast-Modified: Sun, 12 Aug 2018 04:58:50 GMT\r\nX-Original-Content-Length: 25378\r\nContent-Length: 3833\r\nConnection: close\r\nContent-Type: image/jpeg\r\n\r\n\xff\xd8\...<hex data here>...\xff\xd9'
RESPONSE HEADERS SPLIT OFF
HTTP/1.1 200 OK
Date: Sun, 12 Aug 2018 04:59:08 GMT
Server: Apache
Link: <http://www.rit.edu/gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//Abuaitah.jpg>; rel="canonical"
X-Content-Type-Options: nosniff
Accept-Ranges: bytes
Expires: Mon, 12 Aug 2019 04:58:50 GMT
Cache-Control: max-age=31536000
Etag: W/"0"
Last-Modified: Sun, 12 Aug 2018 04:58:50 GMT
X-Original-Content-Length: 25378
Content-Length: 3833
Connection: close
Content-Type: image/jpeg
IMAGE BINARY DATA SPLIT OFF
b'\xff\xd8\...<hex data here>...\xff\xd9'
Bytes in image data: 3850
这是我的代码
class MySocket:
def __init__(self, sock=None):
if sock is None:
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
else:
self.sock = sock
def connect(self, host, port):
self.sock.connect((host, port))
def myclose(self):
self.sock.close()
def mysend(self, msg, debug=False):
if debug:
print("MESSAGE SENT")
print(msg.decode())
self.sock.sendall(msg)
def myreceive(self, debug=False):
received = b''
buffer = 1
while True:
part = self.sock.recv(buffer)
received += part
if part == b'':
break
if debug:
print("Received...")
print(received)
return received
def download_image(img_url):
"""
Download images with the given socket and list of urls
:param img_url: url corresponding to an image
:return: None
"""
image_socket = MySocket()
image_socket.connect("www.rit.edu", 80)
message = "GET " + img_url + " HTTP/1.1\r\n" \
"Host: www.rit.edu\r\n" \
"Accept: image/webp,image/apng,image/*,*/*;q=0.8\r\n" \
"Accept-Language: en-US,en;q=0.9\r\n" \
"Accept-Encoding: gzip, deflate\r\n\r\n"
image_socket.mysend(message.encode(), True)
reply = image_socket.myreceive()
print("ENTIRE MESSAGE RECEIVED")
print(reply)
print()
headers = reply.split(b'\r\n\r\n')[0]
print("RESPONSE HEADERS SPLIT OFF")
print(headers.decode())
image = reply[len(headers)+4:]
print()
print("IMAGE BINARY DATA SPLIT OFF")
print(image)
print()
print("Bytes in image data:", sys.getsizeof(image))
print()
# print(type(image))
img_name = str(len(os.listdir("D:\\Documents\\School\\RIT\\Classes\\Summer 2018\\CSEC 380\\Homework\\3\\Script\\act1step2images"))) + img_url[-4:]
f = open(os.path.join("D:\\Documents\\School\\RIT\\Classes\\Summer 2018\\CSEC 380\\Homework\\3\\Script\\act1step2images", img_name), 'wb')
f.write(image)
f.close()
def main():
download_image("http://www.rit.edu/gccis/computingsecurity/sites/rit.edu.gccis.computingsecurity/files//Abuaitah.jpg")
main()
谁能告诉我发生了什么以及为什么第一次尝试时无法下载 jpg?
【问题讨论】:
标签: python sockets http raw-sockets python-sockets