【发布时间】:2017-07-06 10:02:40
【问题描述】:
我已经编写了一个将数据上传到 S3 的脚本。如果文件小于 5MB,则将其作为一个块上传,但如果文件较大,则进行分段上传。我知道阈值目前很小,同时我只是在测试脚本。如果我通过导入每个函数并以这种方式运行它来从 Python 运行脚本,那么一切都会按预期工作。我知道代码需要清理,因为它还没有完成。但是,当我从命令行运行脚本时,我遇到了这个错误:
Traceback (most recent call last):
File "upload_files_to_s3.py", line 106, in <module>
main()
File "upload_files_to_s3.py", line 103, in main
check_if_mp_needed(conn, input_file, mb, bucket_name, sub_directory)
File "upload_files_to_s3.py", line 71, in check_if_mp_needed
multipart_upload(conn, input_file, mb, bucket_name, sub_directory)
File "upload_files_to_s3.py", line 65, in multipart_upload
mp.complete_upload()
File "/usr/local/lib/python2.7/site-packages/boto/s3/multipart.py", line 304, in complete_upload
self.id, xml)
File "/usr/local/lib/python2.7/site-packages/boto/s3/bucket.py", line 1571, in complete_multipart_upload
response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
>The XML you provided was not well-formed or did not validate against our published schema
代码如下:
import sys
import boto
from boto.s3.key import Key
import os
import math
from filechunkio import FileChunkIO
KEY = os.environ['AWS_ACCESS_KEY_ID']
SECRET = os.environ['AWS_SECRET_ACCESS_KEY']
def start_connection():
key = KEY
secret = SECRET
return boto.connect_s3(key, secret)
def get_bucket_key(conn, bucket_name):
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
return k
def get_key_name(sub_directory, input_file):
full_key_name = os.path.join(sub_directory, os.path.basename(input_file))
return full_key_name
def get_file_info(input_file):
source_size = os.stat(input_file).st_size
return source_size
def multipart_request(conn, input_file, bucket_name, sub_directory):
bucket = conn.get_bucket(bucket_name)
mp = bucket.initiate_multipart_upload(get_key_name(sub_directory, input_file))
return mp
def get_chunk_size(mb):
chunk_size = mb * 1048576
return chunk_size
def get_chunk_count(input_file, mb):
chunk_count = int(math.ceil(get_file_info(input_file)/float(get_chunk_size(mb))))
return chunk_count
def regular_upload(conn, input_file, bucket_name, sub_directory):
k = get_bucket_key(conn, bucket_name)
k.key = get_key_name(sub_directory, input_file)
k.set_contents_from_filename(input_file)
def multipart_upload(conn, input_file, mb, bucket_name, sub_directory):
chunk_size = get_chunk_size(mb)
chunks = get_chunk_count(input_file, mb)
source_size = get_file_info(input_file)
mp = multipart_request(conn, input_file, bucket_name, sub_directory)
for i in range(chunks):
offset = chunk_size * i
b = min(chunk_size, source_size - offset)
with FileChunkIO(input_file, 'r', offset = offset, bytes = b) as fp:
mp.upload_part_from_file(fp, part_num = i + 1)
mp.complete_upload()
def check_if_mp_needed(conn, input_file, mb, bucket_name, sub_directory):
if get_file_info(input_file) <= 5242880:
regular_upload(conn, input_file, bucket_name, sub_directory)
else:
multipart_upload(conn, input_file, mb, bucket_name, sub_directory)
def main():
input_file = sys.argv[1]
mb = sys.argv[2]
bucket_name = sys.argv[3]
sub_directory = sys.argv[4]
conn = start_connection()
check_if_mp_needed(conn, input_file, mb, bucket_name, sub_directory)
if __name__ == '__main__':
main()
谢谢!
【问题讨论】:
-
很可能您在命令行中使用的环境与手动导入所有内容的环境不同。你在这两种情况下都使用什么?
-
我在 IPython 中从
virtualenv运行脚本。命令行仅通过virtualenv运行 -
好的 - 所以不匹配也不是不可能的。你能在这两种情况下检查
boto.__version__吗? -
在普通的 IPython 中是
2.8.0,但在virtualenv中是2.45.0。我如何在不输入 IPython 的情况下从命令行进行检查? -
只需在您的脚本中打印出来并运行它。