【问题标题】:Scraping Twitter Followers using Selenium使用 Selenium 抓取 Twitter 关注者
【发布时间】:2020-08-29 17:52:48
【问题描述】:

我有几个个人资料的链接,我想获取他们关注者的用户名。我不能使用 API,因为它非常慢,而且我需要成千上万的追随者,所以我正在使用 selenium。

driver = webdriver.Chrome()
driver.get("https://twitter.com/login")
time.sleep(2)

login_id = driver.find_elements_by_class_name("r-30o5oe.r-1niwhzg.r-17gur6a.r-1yadl64.r-deolkf.r-homxoj.r-poiln3.r-7cikom.r-1ny4l3l.r-1inuy60.r-utggzx.r-vmopo1.r-1w50u8q.r-1lrr6ok.r-1dz5y72.r-fdjqy7.r-13qz1uu")[0]
login_id.send_keys("Username Here")


password = driver.find_elements_by_class_name("r-30o5oe.r-1niwhzg.r-17gur6a.r-1yadl64.r-deolkf.r-homxoj.r-poiln3.r-7cikom.r-1ny4l3l.r-1inuy60.r-utggzx.r-vmopo1.r-1w50u8q.r-1lrr6ok.r-1dz5y72.r-fdjqy7.r-13qz1uu")[1]
password.send_keys("Password Here")

driver.find_element_by_class_name("css-901oao.r-1awozwy.r-jwli3a.r-6koalj.r-18u37iz.r-16y2uox.r-1qd0xha.r-a023e6.r-vw2c0b.r-1777fci.r-eljoum.r-dnmrzs.r-bcqeeo.r-q4m81j.r-qvutc0").click()

driver.get("Profile Link")

time.sleep(2)


# Code to goto End of the Page
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Wait to load page
    time.sleep(10)
    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

#get usernames element
usernames = driver.find_elements_by_class_name(
        "css-18t94o4.css-1dbjc4n.r-1ny4l3l.r-1j3t67a.r-1w50u8q.r-o7ynqc.r-6416eg")
print(len(usernames))
for username in usernames:
    print(username.find_element_by_class_name("css-4rbku5.css-18t94o4.css-1dbjc4n.r-1loqt21.r-1wbh5a2.r-dnmrzs.r-1ny4l3l").get_attribute("href"))

我正在使用上面的代码转到页面底部,然后提取用户名字段。

问题是我只获得了 1st 20 r 30 个关注者的用户名。 谁能帮帮我?

【问题讨论】:

    标签: selenium twitter selenium-chromedriver


    【解决方案1】:

    我稍微修改了你的代码,你可以试试。也许您需要再次调整睡眠定时器:

    follower_list = []
    # Code to goto End of the Page
    last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        # Wait to load page
        time.sleep(1)
        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height
    
        #get usernames element
        usernames = driver.find_elements_by_class_name(
                "css-18t94o4.css-1dbjc4n.r-1ny4l3l.r-1j3t67a.r-1w50u8q.r-o7ynqc.r-6416eg")
        print(len(usernames))
        for username in usernames:
            username = username.find_element_by_class_name("css-4rbku5.css-18t94o4.css-1dbjc4n.r-1loqt21.r-1wbh5a2.r-dnmrzs.r-1ny4l3l").get_attribute("href")
            if username not in follower_list:
                follower_list.append(username)
    
    print(len(follower_list))
    print(follower_list)
    

    【讨论】:

    • 好的,让我检查一下
    • 好的,让我检查一下
    猜你喜欢
    • 1970-01-01
    • 2022-01-16
    • 2021-03-23
    • 1970-01-01
    • 1970-01-01
    • 2016-02-04
    • 1970-01-01
    • 2012-07-20
    • 2021-05-22
    相关资源
    最近更新 更多