Instagram 向下滚动爬行...使用 python selenium答案

【问题标题】：Instagram crawling with scrolling down...with python seleniumInstagram 向下滚动爬行...使用 python selenium
【发布时间】：2018-05-10 11:43:29
【问题描述】：

total_link = []

temp = ['a']

total_num = 0

while driver.find_element_by_tag_name('div'):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    Divs=driver.find_element_by_tag_name('div').text


    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')
    my_titles = soup.select(
    'div._6d3hm > div._mck9w'
    )

    for title in my_titles:
        try:
            if title in temp:
                #print('중복')
                pass
            else:
                #print('중복이 아니다')
                link = str(title.a.get("href")) #주소를 가져와!
                total_link.append(link)
                #print(link)

        except:
            pass
    print("현재 모은 개수: " + str(len(total_link)))

    temp = my_titles
    time.sleep(2)
    if 'End of Results' in Divs:
        print('end')
        break
    else:
        continue

块引用您好，我正在用韩语标签抓取 Instagram 数据。我的代码如下。

向下滚动页面
1. 通过使用 bs4 和 requests，获取它们的 HTML
2. 定位到时间日志、图片src、文本、标签、ID的点
3. 全选，然后抓取。
4. 处理完页面上的 HTML 后，向下滚动
5. 做同样的事情直到结束
通过这样做，并使用该站点中人员的代码，它似乎可以工作...... 但是在向下滚动几次后，在某些点，滚动停止并显示错误消息

'읽어드리지 못합니다'或英文“无法阅读”

能否知道弹出错误的原因以及如何解决问题？我正在使用 python 和 selenium 谢谢你的回答

【问题讨论】：

寻求调试帮助的问题（“为什么这段代码不起作用？”）必须包括所需的行为、特定问题或错误以及重现它所需的最短代码在问题本身。没有明确的问题陈述的问题对其他读者没有用处。见：How to create a Minimal, Complete, and Verifiable example.
我改变了我的问题的形式。
但我的问题的核心是instagram有什么功能可以阻止无限向下滚动...
你可以参考这个github.com/Aqua-4/auto-insta

标签： python selenium web-crawler instagram

【解决方案1】：

Instagram 正在努力防范恶意攻击，例如抓取或任何其他自动化方式。当您尝试异常快速地访问 Instagram 页面时，通常会发生这种情况。所以你必须更频繁或更长时间地设置time.sleep() 选项。

【讨论】：