【发布时间】:2020-03-08 03:16:06
【问题描述】:
我创建了一个数据框并使用 pyarrow 将该 df 转换为 parquet 文件(也提到了here):
def convert_df_to_parquet(self,df):
table = pa.Table.from_pandas(df)
buf = pa.BufferOutputStream()
pq.write_table(table, buf)
return buf
现在我想将上传的内容保存到 s3 存储桶并尝试为 upload_file()everything 我尝试的不同输入参数都不起作用:
s3_client.upload_file(parquet_file, bucket_name, destination_key)#1st
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file)#2nd
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.getvalue())#3rd
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.read1())#4th
错误:
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.read1())
File "pyarrow/io.pxi", line 376, in pyarrow.lib.NativeFile.read1
File "pyarrow/io.pxi", line 310, in pyarrow.lib.NativeFile.read
File "pyarrow/io.pxi", line 320, in pyarrow.lib.NativeFile.read
File "pyarrow/io.pxi", line 155, in pyarrow.lib.NativeFile.get_input_stream
File "pyarrow/io.pxi", line 170, in pyarrow.lib.NativeFile._assert_readable
OSError: only valid on readonly files
【问题讨论】:
标签: python amazon-s3 boto3 pyarrow