【发布时间】:2020-05-28 16:57:56
【问题描述】:
我是新来的,也是 Python 的新手! 我的磁盘上有一个巨大的电影档案, 我编写了一个下载电影海报的代码,使用磁盘上的电影名称并将海报放在每个电影文件夹中。
from os import listdir
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
import requests
# Loading Film Names
film_names = [f for f in listdir("I:\\Movies")]
# Loading Image Names
image_names = [f for f in listdir("I:\\Movies")]
# Opening Chrome
driver = webdriver.Chrome()
driver.get('http://imdb.com')
counter = 0
for i in film_names[counter:]:
# filtering movie names to become without director names and no release date
parantez_num = i.find('(')
film_names[counter] = i[0:parantez_num - 1]
print(counter)
# Typing Film Name in Search Box
searchbox = driver.find_element_by_xpath('//*[@id="suggestion-search"]')
searchbox.send_keys(film_names[counter])
# Clicking Search Button
searchbutton = driver.find_element_by_xpath('//*[@id="suggestion-search-button"]')
searchbutton.click()
# Clicking First Result
firstsearch = WebDriverWait(driver, 5).until(
expected_conditions.element_to_be_clickable((By.XPATH, '//*
[@id="main"]/div/div[2]/table/tbody/tr[1]/td[2]/a'))
)
firstsearch.click()
# Maximizing Image
wait = WebDriverWait(driver, 10)
image_magnify = wait.until(expected_conditions.element_to_be_clickable((By.XPATH, '//*
[@id="title-overview-widget'''
'"]/div[1]/div[3]/div[1]/a/img')))
image_magnify.click()
# Saving The Image
image = WebDriverWait(driver, 5).until(
expected_conditions.element_to_be_clickable((By.XPATH, '//*[@id="photo-
container"]/div/div[3]/div/div[2]/div['
'1]/div[2]/div/img[2]'))
)
image_url = (image.get_attribute('src'))
with open("I:\\Movies\\" + image_names[counter] + "\\" + image_names[counter] + ".jpg", "wb") as
f:
f.write(requests.get(image_url).content)
# going back to the first page of IMDB
driver.back()
driver.back()
driver.back()
counter += 1
它工作正常。但是随机数次成功下载后,突然停止并报错:
Traceback (most recent call last):
File "C:/Users/Kurdman/Desktop/Poster Downloader/Main/MainCode.py", line 42, in <module>
image_magnifier = wait.until(expected_conditions.element_to_be_clickable((By.XPATH, '//*[@id="title-
overview-widget'
File "C:\Users\Kurdman\Desktop\Poster Downloader\Main\venv\lib\site-
packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
我想知道为什么我会收到此错误,因为“for 循环”对于 10 部电影都可以正常工作,然后突然停止。即使我将计数器设置为 100,从第 100 部电影开始,它也适用于其中一些电影,然后再次停止并给出此错误。
顺便说一句,我的电影名称是这样的格式:
Ajami (2009) - Scandar Copti - Foreign Language Oscar Nominee
所以我通过查找标题中第一个括号的索引来过滤名称以删除发布日期和导演姓名等内容。
【问题讨论】:
-
我会尝试增加 webdriver 超时期限和/或尝试/捕获异常。 (给网站更多时间来更新 DOM...)