python代码在databricks中解压缩s3服务器中的压缩文件答案

【问题标题】：python code to Unzip the zipped file in s3 server in databrickspython代码在databricks中解压缩s3服务器中的压缩文件
【发布时间】：2019-04-10 12:43:40
【问题描述】：

代码是解压缩 s3 服务器中存在的压缩文件。代码在 databricks 、python 版本 :3 和 pandas===0.19.0

中运行

zip_ref = zipfile.ZipFile(path,mode='r') 上面的行会引发如下错误。 FileNotFoundError: [Errno 2] 没有这样的文件或目录：路径

请告诉我为什么这条线会抛出错误，尽管路径是正确的。或者有没有办法在不解压的情况下读取 Zip 文件夹中的内容。

【问题讨论】：

检查'path'中的内容，应该像's3://bucketname/filename.zip'，不要忘记扩展名
您好，路径正确。我尝试将文件保存到路径中，它成功运行。

标签： python amazon-s3 databricks

【解决方案1】：

你可以使用

with zipfile.ZipFile("/dbfs/folder/file.zip", "r") as zip_ref:
    zip_ref.extractall("targetdir")

或者同上的代码，避免在路径字符串中使用':'

【讨论】：

嗨，我也试过删除'：'。但没有运气。

【解决方案2】：

Below is the code

### Declare the variables 
s3client = boto3.client('s3')  # s3 client (Boto3 is the AWS SDK for python)
s3resources = boto3.resource('s3') # s3 resource
filetype = '.zip' # filetype such as zip, csv, json
source_url = 's3://bucketname/' # s3 url with bucket name
bucketname = 'bucketname' # bucket name
zipfile_name = 'local_file' + filetype # folder name with file type in DataBricks
filename = 'zipfilename' + filetype # object key or filename with extn
shapefile_name = 'shapafilename.shp'  # extract file name with type from s3
shapefile_path = os.path.abspath(zipfile_name) #+ '/' + filename  # local filepath from the DB
os_CurDir_file = os.curdir + 'shapefiles'
### downloading the files from s3 to the local databricks
s3resources.Bucket(bucketname).download_file(filename, zipfile_name)   
### unzip the file in the local DB
with zipfile.ZipFile(shapefile_path, 'r') as zip_ref:
    zip_ref.extractall(os_CurDir_file)   
### import shapefile using geopandas
plot_locations_df = geopandas.read_file(
                          os.path.join(
                          os_CurDir_file, 
                          shapefile_name))
plot_locations_df['geometry'] = plot_locations_df.geometry.apply(lambda x: x.wkt).apply(lambda x: re.sub('"(.*)"', '\\1', x)) ### convert struct to string
display(plot_locations_df.head(5))

【讨论】：