【发布时间】:2021-06-04 16:23:22
【问题描述】:
我想从rottentomatoes 中提取演员名称。第一部电影THE HITCHHIKER'S GUIDE TO THE GALAXY有四个名字作为主演。他们是 Sam Rockwell, Zooey Deschanel, Yasiin Bey, Martin Freeman。我的代码对星号 scraping 完全没问题。但是,它显示的是一部电影的四个演员的名字,而是显示四部电影的四个演员的名字。
我的代码:
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url= 'https://editorial.rottentomatoes.com/guide/best-sci-fi-movies-of-all-time/'
r = requests.get(url, headers=headers)#, proxies=proxies)
content = r.content
soup = BeautifulSoup(content)
name =[]
year = []
rating = []
director = []
starring = []
movies = soup.find_all('div',{'class':'article_movie_title'})
for movie in movies:
title = movie.find('h2').find('a').text
name.append(title)
release = movie.find('h2').find('span', attrs={'class':'subtle start-year'}).text
year.append(release)
R = movie.find('h2').find('span', attrs={'class':'tMeterScore'}).text
rating.append(R)
for d in soup.find_all('div', attrs={'class': 'info director'}):
for a in d.find_all('a'):
director.append(a.string)
for c in soup.find_all('div', attrs={'class': 'info cast'}):
for c1 in c.find_all('a'):
starring.append(c1.text)
我创建一个字典,然后从该字典创建一个 csv 表。
import pandas as pd
my_dict = {'Movie_name': name,
'Release_year': year,
'Movie_rating': rating,
'Director of movie': director,
'Starring': starring }
movie_All = pd.DataFrame({ key:pd.Series(value) for key, value in my_dict.items() })
movie_All.to_csv('movies_rot.csv', index=False, encoding='utf-8')
movie_All.head()
桌子的样子
应该是这样的
Movie_name Release_year Movie_rating Director of movie Starring
0 The Hitchhiker's Guide to the Galaxy (2005) 60% Garth Jennings Sam Rockwell,Zooey Deschanel,Yasiin Bey, Martin Freeman
如何根据电影选择明星的名字?
【问题讨论】:
标签: python list csv web-scraping beautifulsoup