【发布时间】:2021-07-19 00:58:15
【问题描述】:
我想抓取电影的类型和长度(运行时间),以获得 250 部电影的列表。 一个名为“链接”的列表包含这 250 个电影页面的 URL。 我编写了一个代码来从包含 250 个 URL 的列表“链接”中的单个 URL 中提取流派和长度。
links=['https://www.imdb.com/title/tt0093603/','https://www.imdb.com/title/tt8176054/','https://www.imdb.com/title/tt0367495/','https://www.imdb.com/title/tt0048473/','https://www.imdb.com/title/tt0079221/','https://www.imdb.com/title/tt7391996/','https://www.imdb.com/title/tt0052572/','https://www.imdb.com/title/tt0237376/','https://www.imdb.com/title/tt0214915/','https://www.imdb.com/title/tt5311546/','https://www.imdb.com/title/tt7019842/','https://www.imdb.com/title/tt0105575/','https://www.imdb.com/title/tt0400234/','https://www.imdb.com/title/tt8413338/','https://www.imdb.com/title/tt12361178/','https://www.imdb.com/title/tt4991384/','https://www.imdb.com/title/tt1187043/','https://www.imdb.com/title/tt8948790/','https://www.imdb.com/title/tt0986264/','https://www.imdb.com/title/tt10189514/','https://www.imdb.com/title/tt0101649/','https://www.imdb.com/title/tt5074352/','https://www.imdb.com/title/tt9477520/','https://www.imdb.com/title/tt7060344/','https://www.imdb.com/title/tt9900782/','https://www.imdb.com/title/tt0291855/','https://www.imdb.com/title/tt0048956/','https://www.imdb.com/title/tt0085743/','https://www.imdb.com/title/tt0050870/','https://www.imdb.com/title/tt7738784/','https://www.imdb.com/title/tt5959980/','https://www.imdb.com/title/tt0059246/','https://www.imdb.com/title/tt4987556/','https://www.imdb.com/title/tt0312859/','https://www.imdb.com/title/tt0072783/','https://www.imdb.com/title/tt0119385/','https://www.imdb.com/title/tt0292246/','https://www.imdb.com/title/tt10214826/','https://www.imdb.com/title/tt7019942/','https://www.imdb.com/title/tt3417422/','https://www.imdb.com/title/tt7465992/','https://www.imdb.com/title/tt5867800/','https://www.imdb.com/title/tt6148156/','https://www.imdb.com/title/tt8239946/',
'https://www.imdb.com/title/tt0466460/','https://www.imdb.com/title/tt0459516/','https://www.imdb.com/title/tt4679210/','https://www.imdb.com/title/tt0376127/','https://www.imdb.com/title/tt0066763/','https://www.imdb.com/title/tt3973410/','https://www.imdb.com/title/tt3668162/','https://www.imdb.com/title/tt0220656/','https://www.imdb.com/title/tt6380520/','https://www.imdb.com/title/tt0195231/','https://www.imdb.com/title/tt8108198/','https://www.imdb.com/title/tt4429128/','https://www.imdb.com/title/tt2877108/','https://www.imdb.com/title/tt2181831/','https://www.imdb.com/title/tt3569782/','https://www.imdb.com/title/tt0376076/','https://www.imdb.com/title/tt1954470/','https://www.imdb.com/title/tt1620933/','https://www.imdb.com/title/tt5312232/','https://www.imdb.com/title/tt2356180/','https://www.imdb.com/title/tt0242519/','https://www.imdb.com/title/tt4934950/','https://www.imdb.com/title/tt0367110/','https://www.imdb.com/title/tt0073707/','https://www.imdb.com/title/tt2218988/','https://www.imdb.com/title/tt0871510/','https://www.imdb.com/title/tt0375611/','https://www.imdb.com/title/tt0104561/','https://www.imdb.com/title/tt0054098/','https://www.imdb.com/title/tt1562872/','https://www.imdb.com/title/tt4430212/','https://www.imdb.com/title/tt4851630/','https://www.imdb.com/title/tt5005684/','https://www.imdb.com/title/tt10324144/','https://www.imdb.com/title/tt1639426/','https://www.imdb.com/title/tt0057935/','https://www.imdb.com/title/tt7060460/','https://www.imdb.com/title/tt1280558/','https://www.imdb.com/title/tt3322420/','https://www.imdb.com/title/tt4635372/','https://www.imdb.com/title/tt0242256/','https://www.imdb.com/title/tt0200087/','https://www.imdb.com/title/tt0374887/','https://www.imdb.com/title/tt0139876/','https://www.imdb.com/title/tt0292490/','https://www.imdb.com/title/tt0105271/','https://www.imdb.com/title/tt9052870/','https://www.imdb.com/title/tt2283748/','https://www.imdb.com/title/tt0405508/','https://www.imdb.com/title/tt0364647/','https://www.imdb.com/title/tt0169102/','https://www.imdb.com/title/tt1821480/','https://www.imdb.com/title/tt0109117/','https://www.imdb.com/title/tt8291224/','https://www.imdb.com/title/tt2338151/','https://www.imdb.com/title/tt2358592/','https://www.imdb.com/title/tt0453729/','https://www.imdb.com/title/tt0319736/','https://www.imdb.com/title/tt0843326/','https://www.imdb.com/title/tt2082197/','https://www.imdb.com/title/tt5571734/','https://www.imdb.com/title/tt0112553/','https://www.imdb.com/title/tt0379370/','https://www.imdb.com/title/tt8144834/','https://www.imdb.com/title/tt0488414/','https://www.imdb.com/title/tt0116630/','https://www.imdb.com/title/tt13299890/','https://www.imdb.com/title/tt0456144/','https://www.imdb.com/title/tt7822438/','https://www.imdb.com/title/tt5824826/','https://www.imdb.com/title/tt4849438/','https://www.imdb.com/title/tt0072860/','https://www.imdb.com/title/tt1695800/','https://www.imdb.com/title/tt2564144/','https://www.imdb.com/title/tt1261047/','https://www.imdb.com/title/tt0063404/','https://www.imdb.com/title/tt0471571/','https://www.imdb.com/title/tt7392212/','https://www.imdb.com/title/tt3390572/','https://www.imdb.com/title/tt0112870/','https://www.imdb.com/title/tt6315524/','https://www.imdb.com/title/tt5906392/','https://www.imdb.com/title/tt0213969/','https://www.imdb.com/title/tt2882328/','https://www.imdb.com/title/tt0050188/','https://www.imdb.com/title/tt1821317/','https://www.imdb.com/title/tt2377938/','https://www.imdb.com/title/tt7838252/','https://www.imdb.com/title/tt10919240/','https://www.imdb.com/title/tt1180583/','https://www.imdb.com/title/tt1773764/','https://www.imdb.com/title/tt3394420/','https://www.imdb.com/title/tt7725596/','https://www.imdb.com/title/tt2395469/','https://www.imdb.com/title/tt1327035/','https://www.imdb.com/title/tt3863552/','https://www.imdb.com/title/tt1649431/','https://www.imdb.com/title/tt0051792/','https://www.imdb.com/title/tt0220832/','https://www.imdb.com/title/tt1857670/','https://www.imdb.com/title/tt3614516/','https://www.imdb.com/title/tt7180544/','https://www.imdb.com/title/tt0296574/','https://www.imdb.com/title/tt7294534/','https://www.imdb.com/title/tt3449292/','https://www.imdb.com/title/tt11581174/','https://www.imdb.com/title/tt2585562/','https://www.imdb.com/title/tt1188996/','https://www.imdb.com/title/tt5082014/','https://www.imdb.com/title/tt3124456/',
'https://www.imdb.com/title/tt8110330/',
'https://www.imdb.com/title/tt0347304/',
'https://www.imdb.com/title/tt1093370/',
'https://www.imdb.com/title/tt2924472/',
'https://www.imdb.com/title/tt1609168/',
'https://www.imdb.com/title/tt6167894/',
'https://www.imdb.com/title/tt0118751/',
'https://www.imdb.com/title/tt7485048/',
'https://www.imdb.com/title/tt2325915/',
'https://www.imdb.com/title/tt0375878/',
'https://www.imdb.com/title/tt1417299/',
'https://www.imdb.com/title/tt7218518/',
'https://www.imdb.com/title/tt0323013/',
'https://www.imdb.com/title/tt8108200/',
'https://www.imdb.com/title/tt2631186/',
'https://www.imdb.com/title/tt0455829/',
'https://www.imdb.com/title/tt0824316/',
'https://www.imdb.com/title/tt0222012/',
'https://www.imdb.com/title/tt11322920/',
'https://www.imdb.com/title/tt3848892/',
'https://www.imdb.com/title/tt10717738/',
'https://www.imdb.com/title/tt4387040/',
'https://www.imdb.com/title/tt5764096/',
'https://www.imdb.com/title/tt0366840/',
'https://www.imdb.com/title/tt2181931/',
'https://www.imdb.com/title/tt1517561/',
'https://www.imdb.com/title/tt0373856/',
'https://www.imdb.com/title/tt2926068/',
'https://www.imdb.com/title/tt2350496/',
'https://www.imdb.com/title/tt1077248/',
'https://www.imdb.com/title/tt0402014/',
'https://www.imdb.com/title/tt13206926/',
'https://www.imdb.com/title/tt8130968/',
'https://www.imdb.com/title/tt0816258/',
'https://www.imdb.com/title/tt6108090/',
'https://www.imdb.com/title/tt4169250/',
'https://www.imdb.com/title/tt0291376/',
'https://www.imdb.com/title/tt2317337/',
'https://www.imdb.com/title/tt0093578/',
'https://www.imdb.com/title/tt7098658/',
'https://www.imdb.com/title/tt4434004/',
'https://www.imdb.com/title/tt1907761/',
'https://www.imdb.com/title/tt7758160/',
'https://www.imdb.com/title/tt0077451/',
'https://www.imdb.com/title/tt4432480/',
'https://www.imdb.com/title/tt1230165/',
'https://www.imdb.com/title/tt0420332/',
'https://www.imdb.com/title/tt3822396/',
'https://www.imdb.com/title/tt1851988/',
'https://www.imdb.com/title/tt5121000/',
'https://www.imdb.com/title/tt1288638/',
'https://www.imdb.com/title/tt0499375/',
'https://www.imdb.com/title/tt0431619/',
'https://www.imdb.com/title/tt2187153/',
'https://www.imdb.com/title/tt0196069/',
'https://www.imdb.com/title/tt2213054/',
'https://www.imdb.com/title/tt3801314/',
'https://www.imdb.com/title/tt1292703/',
'https://www.imdb.com/title/tt4981966/',
'https://www.imdb.com/title/tt1266583/',
'https://www.imdb.com/title/tt1839596/',
'https://www.imdb.com/title/tt0422320/',
'https://www.imdb.com/title/tt7998242/',
'https://www.imdb.com/title/tt2258337/',
'https://www.imdb.com/title/tt0110222/',
'https://www.imdb.com/title/tt0109555/',
'https://www.imdb.com/title/tt6484982/',
'https://www.imdb.com/title/tt4900716/',
'https://www.imdb.com/title/tt3320542/',
'https://www.imdb.com/title/tt7142506/',
'https://www.imdb.com/title/tt1241195/',
'https://www.imdb.com/title/tt8108268/',
'https://www.imdb.com/title/tt0150433/',
'https://www.imdb.com/title/tt2855648/',
'https://www.imdb.com/title/tt0098999/',
'https://www.imdb.com/title/tt0432047/',
'https://www.imdb.com/title/tt3447364/',
'https://www.imdb.com/title/tt1014672/',
'https://www.imdb.com/title/tt1926313/',
'https://www.imdb.com/title/tt5286444/',
'https://www.imdb.com/title/tt2980794/',
'https://www.imdb.com/title/tt8042292/',
'https://www.imdb.com/title/tt1447500/',
'https://www.imdb.com/title/tt0106333/',
'https://www.imdb.com/title/tt2140465/',
'https://www.imdb.com/title/tt0920464/',
'https://www.imdb.com/title/tt5310090/',
'https://www.imdb.com/title/tt7212754/',
'https://www.imdb.com/title/tt1324059/',
'https://www.imdb.com/title/tt3767372/',
'https://www.imdb.com/title/tt2375559/',
'https://www.imdb.com/title/tt6027478/',
'https://www.imdb.com/title/tt8590896/',
'https://www.imdb.com/title/tt0172684/',
'https://www.imdb.com/title/tt6206564/',
'https://www.imdb.com/title/tt0449994/']]
现在我必须为该列表中的所有 250 个 URL 执行此操作。当循环这个过程时,我只得到了最后一个 URL 信息。
这是我为 1 个 URL 编写的代码,
def get_movie_info(a_tag, div_tag):
# returns all the required info about a movie
span_tags1 = a_tag.find_all('span')
genre=span_tags1[0].text.strip()
li_tags = div_tag.find_all('li')
length_of_film=li_tags[1].text.strip()
return genre, length_of_film
movie_page_url = links[0] #1st url in the list
response = requests.get(movie_page_url)
#get a tags
a_tags = movie_doc.find_all('a', attrs={'class':"GenresAndPlot__GenreChip-cum89p-3 fzmeux ipc-chip ipc-chip--on-baseAlt"})
#get div tags
div_tags = movie_doc.find_all('div', attrs={'class':"TitleBlock__TitleMetaDataContainer-sc-1nlhx7j-2 hWHMKr"})
movie_dict = {
'genre1' : [],
'length_of_movie' : []}
a_tag = a_tags[0]
div_tag = div_tags[0]
movie_info = get_movie_info(a_tag,div_tag)
movie_dict['genre1'].append(movie_info[0])
movie_dict['length_of_movie'].append(movie_info[1])
输出是
movie_dict = {'genre1': ['犯罪'], 'length_of_movie': ['2h 25min']}
输出应该是包含“genre1”和“length_of_movie”列以及 250 行的数据帧,分别是电影的流派和长度
【问题讨论】:
标签: python list loops web-scraping imdb