【问题标题】:Upload files to gs bucket in for loop在for循环中将文件上传到gs存储桶
【发布时间】:2020-06-17 10:40:48
【问题描述】:

在下面的代码中,一个 pdf 文档被拆分并保存在我的本地驱动器中,一旦拆分过程完成,就会进行上传过程。在上传过程中,所有分割后的文件都会递归上传到 gs 存储桶。如何转换下面的代码以将拆分的文件直接上传到 gs 存储桶,而不是存储在本地然后上传?我试过了,但没有成功

#!/usr/bin/python3
import PyPDF2
from PyPDF2 import PdfFileWriter, PdfFileReader
import os
import glob
import sys
from google.cloud import storage

inputpdf = PdfFileReader(open(r"ace.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open(r"/home/playground/doc_pages/document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

def upload_local_directory_to_gcs(local_path, bucket, gcs_path):
        assert os.path.isdir(local_path)
        for local_file in glob.glob(local_path + '/**'):
            if not os.path.isfile(local_file):
                continue
            remote_path = os.path.join(gcs_path, local_file[1 + len(local_path) :])
            storage_client = storage.Client()
            buck = storage_client.bucket(bucket)
            blob = buck.blob(remote_path)
            blob.upload_from_filename(local_file)
            print("Uploaded " + local_file + " to gs bucket " + bucket)

upload_local_directory_to_gcs('/home/playground/doc_pages', 'doc_pages', '')

【问题讨论】:

  • 您可以将文件作为临时文件存储在/tmp 中,这比写入磁盘要快。
  • @Juancki 但是如何将该文件在拆分时存储到 gs 存储桶中?
  • 请看我的回答,对你有用吗?
  • 太棒了,就像魅力一样!谢谢一堆..如果我将此脚本转换为docker容器,临时文件将是什么? /tmp ?
  • 在docker中你可以用--tempfs定义临时文件系统来匹配/tmpstackoverflow.com/a/52662602/6003934

标签: python google-cloud-platform google-cloud-storage pypdf2


【解决方案1】:

使用临时文件,它看起来像这样:

#!/usr/bin/python3
import PyPDF2
from PyPDF2 import PdfFileWriter, PdfFileReader
import os
import glob
import sys
from google.cloud import storage

inputpdf = PdfFileReader(open(r"ace.pdf", "rb"))
# create temporal folder
os.makedirs('/tmp/doc_pages')   
for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    # Write to temporal files
    with open(r"/tmp/doc_pages/document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

def upload_local_directory_to_gcs(local_path, bucket, gcs_path):
        assert os.path.isdir(local_path)
        for local_file in glob.glob(local_path + '/**'):
            if not os.path.isfile(local_file):
                continue
            remote_path = os.path.join(gcs_path, local_file[1 + len(local_path) :])
            storage_client = storage.Client()
            buck = storage_client.bucket(bucket)
            blob = buck.blob(remote_path)
            blob.upload_from_filename(local_file)
            print("Uploaded " + local_file + " to gs bucket " + bucket)

upload_local_directory_to_gcs('/tmp/doc_pages', 'doc_pages', '') # Change source

【讨论】:

  • 要删除临时文件夹,请使用 `os.remove('/tmp/doc_pages') 这将避免顺序执行之间的冲突
猜你喜欢
  • 2020-04-02
  • 1970-01-01
  • 2018-04-25
  • 1970-01-01
  • 1970-01-01
  • 2018-03-25
  • 2018-02-07
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多