【发布时间】:2017-07-21 15:41:00
【问题描述】:
我正在尝试制作一个 python 应用程序来提取 youtube 频道视频的所有 youtube 标题。
我目前正在尝试使用 selenium。
def getVideoTitles():
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(googleYoutubePage())
titleElement = driver.find_element_by_class_name("yt-lockup-content")
print(titleElement.text) #it prints out title, + views, hours ago, and "CC"
#I suck at selenium so lets just store the title and cut everything after it
class_name yt-lockup-content 是 youtube 频道 /videos 页面上每个视频的类名。 在上面的代码中,我可以获得该页面上第一个 youtube 视频的标题。但我想遍历所有 youtube 标题(换句话说,我想遍历每个 yt-lockup-content 元素)以存储 .text。
但我想知道如何访问 yt-lockup-content[2] 论文。换句话说,这将是该页面上的第二个视频,具有相同的类名
这是我的完整代码。 尽情玩吧
'''
'''
import selenium
from selenium import webdriver
def getChannelName():
print("Please enter the channel that you would like to scrape video titles...")
channelName = input()
googleSearch = "https://www.google.ca/search?q=%s+youtube&oq=%s+youtube&aqs=chrome..69i57j0l5.2898j0j4&sourceid=chrome&ie=UTF-8#q=%s+youtube&*" %(channelName, channelName, channelName)
print(googleSearch)
return googleSearch
def googleYoutubePage():
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(getChannelName())
element = driver.find_element_by_class_name("s") #this is where the link to the proper youtube page lives
keys = element.text #this grabs the link to the youtube page + other crap that will be cut
splitKeys = keys.split(" ") #this needs to be split, because aside from the link it grabs the page description, which we need to truncate
linkToPage = splitKeys[0] #this is where the link lives
for index, char in enumerate(linkToPage): #this loops over the link to find where the stuff beside the link begins (which is unecessary)
if char == "\n":
extraCrapStartsHere = index #it starts here, we know everything beyond here can be cut
link = ""
for i in range(extraCrapStartsHere): #the offical link will be everything in the linkToPage up to where we found suitable to cut
link = link + linkToPage[i]
videosPage = link + "/videos"
print(videosPage)
return videosPage
def getVideoTitles():
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(googleYoutubePage())
titleElement = driver.find_element_by_class_name("yt-lockup-content")
print(titleElement.text) #it prints out title, + views, hours ago, and "CC"
#I suck at selenium so lets just store the title and cut everything after it
def main():
getVideoTitles()
main()
【问题讨论】:
-
我想不通。我已经这样做了... textelement = driver.find_element_by_xpath("//div[@class= yt-lockup-content")[1] 但是当我运行 print(textelement.text) 时出现错误
-
不,不起作用,谢谢您的建议。我会继续尝试其他的事情
-
没有尝试过,并修复了 yt-uix-TITLE-link 的拼写。无论如何,再次感谢
标签: python python-3.x selenium video youtube