【发布时间】:2021-05-22 19:15:41
【问题描述】:
我正在尝试使用 gcsfs.GCSFileSystem 运行一个程序来访问 Google Cloud Storage,所有这些都使用 python 的 concurrent.futures.ProcessPoolExecutor。
要运行的代码实际上非常复杂,但我设法将其归结为这个最小的非工作示例:
from concurrent.futures import ProcessPoolExecutor
from gcsfs import GCSFileSystem
def f(path):
print(f"Creating {path}...")
print("Created. Getting glob...")
print(main_fs.glob(path))
print("Done!")
if __name__ == "__main__":
main_fs = GCSFileSystem()
print(main_fs.glob("code_tests_sand"))
with ProcessPoolExecutor(max_workers=10) as pool:
l_ = []
for sub_rules_list in (pool.map(f, ["code_tests_sand"])):
l_.append(0)
我希望:
['code_tests_sand']
Creating code_tests_sand...
Created. Getting glob...
['code_tests_sand']
Done!
我明白了:
['code_tests_sand']
Creating code_tests_sand...
Created. Getting glob...
程序卡在这里没有结束。
我找到了一种通过将 GCSFileSystem 对象明确地提供给函数来获得预期输出的方法:
from concurrent.futures import ProcessPoolExecutor
from gcsfs import GCSFileSystem
def f(path, ff):
print(f"Creating {path}...")
print("Created. Getting glob...")
print(ff.glob(path))
print("Done!")
if __name__ == "__main__":
main_fs = GCSFileSystem()
print(main_fs.glob("code_tests_sand"))
with ProcessPoolExecutor(max_workers=10) as pool:
l_ = []
for sub_rules_list in (pool.map(f, ["code_tests_sand"], [main_fs])):
l_.append(0)
但是,这对我来说不是一个好的解决方案,因为我无法在我的真实代码中做到这一点。关于为什么会发生这种情况以及如何解决它的任何想法?
仅供参考,我在 Ubuntu 18、Python 3.8 上运行,这是我的 pip freeze 输出:
aiohttp==3.7.3
async-timeout==3.0.1
attrs==20.3.0
cachetools==4.2.1
certifi==2020.12.5
chardet==3.0.4
decorator==4.4.2
fsspec==0.8.5
gcsfs==0.7.2
google-auth==1.27.0
google-auth-oauthlib==0.4.2
idna==2.10
multidict==5.1.0
oauthlib==3.1.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.1
six==1.15.0
typing-extensions==3.7.4.3
urllib3==1.26.3
yarl==1.6.3
【问题讨论】:
-
您报告的问题可能与我刚刚报告的 gcsfs 问题有关:github.com/dask/gcsfs/issues/379
标签: python-3.x google-cloud-platform google-cloud-storage