尝试使用 Selenium 和 Python 抓取 Instagram 帖子的评论时出错 [重复]答案

【问题标题】：Error while trying to scrape Instagram post's comments using Selenium and Python [duplicate]尝试使用 Selenium 和 Python 抓取 Instagram 帖子的评论时出错 [重复]
【发布时间】：2021-09-29 19:23:25
【问题描述】：

我只是使用我在此网站上找到的代码https://www.bestproxyreviews.com/instagram-scraper/。代码是这样的：

from selenium import webdriver

    class InstagramScraper:
    
        def __init__(self, post_url):
            self.post_url = post_url
            self.comments = []
            chrome_options = webdriver.ChromeOptions()
            chrome_options.add_argument("--headless")
            self.chrome = webdriver.Chrome(chrome_options=chrome_options)
        def scrape_comments(self):
            browser = self.chrome.get(self.post_url)
            content = self.chrome.page_source
            comments = self.chrome.find_element_by_class_name("XQXOT").find_elements_by_class_name("Mr508")
            for comment in comments:
                d = comment.find_element_by_class_name("ZyFrc").find_element_by_tag_name("li").find_element_by_class_name("P9YgZ").find_element_by_tag_name("div")
                d = d.find_element_by_class_name("C4VMK")
                poster = d.find_element_by_tag_name("h3").text
                post = d.find_element_by_tag_name("span").text
                self.comments.append({
                    "poster": poster,
                    "post": post
                })
    
                return self.comments
        
    post_url = "https://www.instagram.com/p/CTu1euSp6jZ/"
    x = InstagramScraper(post_url)
    x.scrape_comments()

还有错误：

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".XQXOT"}
      (Session info: headless chrome=94.0.XXXX.XX)

我想要做的是在 .csv 上获取帖子的所有 cmets。

有什么提示吗？

【问题讨论】：

您是否（重新）在这里搜索过 SO：[python] NoSuchElementException: Message: no such element: Unable to locate element？然后edit您的问题并告诉我们这些答案在解决您的问题时遗漏了什么。

标签： python instagram screen-scraping

【解决方案1】：

Instagram 异步加载 cmets 和其他数据。为确保它们已加载，请使用 driver.implicitly_wait(30) 方法确保 Selenium 有时间查找这些元素（如果尚未加载）：

...
# Get the Instagram page
browser = self.chrome.get(self.post_url)

# Wait for comments to load
self.chrome.implicitly_wait(30) # 30 Seconds, you can lower this if your internet connection is stable and fast

# Get comments
comments = self.chrome.find_element_by_class_name("XQXOT").find_elements_by_class_name("Mr508")

# Get content now that comments have loaded
content = self.chrome.page_source
...

【讨论】：

希望我的编辑很合适:)
有帮助的文档链接和解释代码注释?️ 与手动计算和实现相比，隐式 AJAX（动态加载）是 selenium 的优势之一。请求、BeautifulSoup等

【解决方案2】：

正如您在链接图片中看到的，评论按钮的类称为“wp06b”。我在图片的 html 中搜索了您要使用的类，但这似乎是 <ul> 标签，在 html 中是项目符号列表。所以在你的代码中，你试图点击一个不存在的按钮。您可以尝试使用带有按钮的 xpath 查找元素，如下所示

comments = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '/html/body/div[7]/div[2]/div/article/div/div[2]/div/div[2]/section[1]/span[2]/button'))) 
comments.click()

【讨论】：