【发布时间】:2022-01-16 23:21:45
【问题描述】:
我在 AWS SageMaker 上使用 Jupyter Lab 实例。
内核:conda_mxnet_latest_p37.
url_lib 包含一些错误的 url,我异常处理。
['15', '259', '26', '58', 'https://imagepool.1und1-drillisch.de/v2/download/nachhaltigkeitsbericht/1&1Drillisch_Sustainability_Report_EN_2018.pdf', 'https://imagepool.1und1-drillisch.de//v2/download/nachhaltigkeitsbericht/2018-04-06_1und1-Drillisch_Sustainability_Report_eng.pdf', '6', 'http://youxin.37.com/uploads/file/1556248045.pdf', '80', 'https://multimedia.3m.com/mws/media/1691941O/2019-sustainability-report.PDF', 'https://s3-us-west-2.amazonaws.com/ungc-production/attachments/cop_2020/483648/original/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf?1583154650', 'https://drive.google.com/open?id=1_dnBcfXWjexy9QoWRhOk_3gnOkWfYRCw', 'http://aepsustainability.com/performance/docs/2020AEPGRIReport.pdf'] # sample
但是,那些工作的 URL 会抛出这个错误:
[Errno 13] Permission denied: '/data'
我没有打开目录,也没有打开文件,因为我没有下载它们。
我在Terminal中跑了,没有运气:
sh-4.2$ chmod 777 data
sh-4.2$ chmod 777 data/
sh-4.2$ chmod 777 data/gri
sh-4.2$ chmod 777 data/gri/
代码:
import pandas as pd
import opendatasets as od
import urllib
import zipfile
import os
csr_df = pd.read_excel('data/Company Sustainability Reports.xlsx', index_col=None)
url_list = csr_df['Report PDF Address'].tolist()
for url in url_list:
try:
download = od.download(url, '/data/gri/')
filename = url.rsplit('/', 1)[1]
path_extract = 'data/gri/' + filename
with zipfile.ZipFile('data/gri/' + filename + '.zip', 'r') as zip_ref:
zip_ref.extractall(path_extract)
os.remove(path_extract + 'readme.txt')
filenames = os.listdir(path_extract)
scans = []
for f in filenames:
with Image.open(path_extract + f) as img:
matrix = np.array(img)
scans.append(matrix)
# shutil.rmtree(path_extract)
os.remove(path_extract[:-1] + '.zip')
except (urllib.error.URLError, IOError, RuntimeError) as e:
print('Download PDFs', e)
输出:
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs list index out of range
Download PDFs [Errno 13] Permission denied: '/data'
...
如果还有什么我需要澄清的,请告诉我。
【问题讨论】:
标签: python-3.x jupyter-lab amazon-sagemaker permission-denied