在搜索栏中输入查询并抓取结果答案

【问题标题】：Enter query in search bar and scrape results在搜索栏中输入查询并抓取结果
【发布时间】：2020-01-22 05:05:55
【问题描述】：

我有一个数据库，其中包含不同书籍的 ISBN 编号。我使用 Python 和 Beautifulsoup 收集了它们。接下来，我想为书籍添加类别。书籍类别有一个标准。一个名为https://www.bol.com/nl/ 的网站拥有所有符合标准的书籍和类别。

起始网址：https://www.bol.com/nl/

国际标准书号：9780062457738

搜索后的网址：https://www.bol.com/nl/p/the-subtle-art-of-not-giving-a-f-ck/9200000053655943/

HTML 类别：<li class="breadcrumbs__item"

有谁知道如何 (1) 在搜索栏中输入 ISBN 值，(2) 然后提交搜索查询并使用页面进行抓取？

步骤 (3) 抓取所有类别是我可以做的事情。但我不知道如何执行前两个步骤。

到目前为止我为步骤 (2) 编写的代码

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

webpage = "https://www.bol.com/nl/" # edit me
searchterm = "9780062457738" # edit me

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(webpage)

sbox = driver.find_element_by_class_name("appliedSearchContextId")
sbox.send_keys(searchterm)

submit = driver.find_element_by_class_name("wsp-search__btn  tst_headerSearchButton")
submit.click()

到目前为止我为步骤 (3) 编写的代码

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.bol.com/nl/p/the-subtle-art-of-not-giving-a-f-ck/9200000053655943/')

soup = BeautifulSoup(data.text, 'html.parser')

categoryBar = soup.find('ul',{'class':'breadcrumbs breadcrumbs--show-last-item-small'})

for category in categoryBar.find_all('span',{'class':'breadcrumbs__link-label'}):
    print(category.text)

【问题讨论】：

你的代码试验是什么，你得到了什么错误？
@Dev 我没有收到任何错误。我只是不知道从哪里开始。（2）中的代码来自互联网，但我不知道如何正确使用 webdriver。你知道怎么做吗？

标签： python web-scraping beautifulsoup selenium-chromedriver

【解决方案1】：

您可以使用selenium 定位输入框并遍历您的 ISBN，分别输入：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
d = webdriver.Chrome('/path/to/chromedriver')
books = ['9780062457738']
for book in books:
  d.get('https://www.bol.com/nl/')
  e = d.find_element_by_id('searchfor')
  e.send_keys(book)
  e.send_keys(Keys.ENTER)
  #scrape page here

现在，对于books 中的每本书 ISBN，解决方案将在搜索框中输入值并加载所需的页面。

【讨论】：

【解决方案2】：

您可以编写一个返回类别的函数。您可以基于页面所做的实际搜索来整理参数，并且可以使用 GET。

import requests
from bs4 import BeautifulSoup as bs

def get_category(isbn): 
    r = requests.get(f'https://www.bol.com/nl/rnwy/search.html?Ntt={isbn}&searchContext=books_all') 
    soup = bs(r.content,'lxml')
    category = soup.select_one('#option_block_4 > li:last-child .breadcrumbs__link-label')

    if category is None:
        return 'Not found'
    else:
        return category.text

isbns = ['9780141311357', '9780062457738', '9780141199078']

for isbn in isbns:
    print(get_category(isbn))

【讨论】：

感谢您的帮助。但是 Ajax1234 的解决方案对我有用
没关系。这个不适合你吗？我测试了它，它似乎工作得很好。