Python BeautifulSoup 在 CSV 中写入 1 行答案

【问题标题】：Python BeautifulSoup writing 1 line in CSVPython BeautifulSoup 在 CSV 中写入 1 行
【发布时间】：2020-10-16 18:25:46
【问题描述】：

我正在尝试获取页面上显示的产品名称、链接和价格的所有值。每个占一行并用逗号分隔。

我已经编写了可以在类似网站上运行的代码，但由于某种原因，它只将第一个结果写入 CSV。

import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('https://www.micoca-cola.cl/bebidas/coca-cola')
soup = BeautifulSoup(response.text, 'html.parser')

items = soup.find_all(class_='prateleira vitrine n12colunas')

with open('coca.csv', 'w', newline='') as csv_file:
    csv_writer = writer(csv_file)
    headers = ['Producto', 'Link', 'Precio']
    csv_writer.writerow(headers)

    for item in items:
        producto = item.find(class_='product-block-name').get_text()
        link = item.find('a')['href']
        price = item.find(class_='bestPrice').get_text().replace('\n', '').replace('"', '').replace(' ', '')
        csv_writer.writerow([producto, link, price])

这给出了以下结果：产品、链接、Precio “Refill 8 Coca-Cola Sin Azúcar 可回收 2,0 lt。（不包括包装）”，https://www.micoca-cola.cl/refill-8-coca-cola-sin-azucar-retornable-20-lt -no-incluye-envases/p,"$9.520,00"

但是该页面上还有其他产品，我想将它们包含在自己的行中。

缺少什么？

【问题讨论】：

您是否尝试过调试它？ items 中有多少项？
如果网站只向您的脚本返回一行，则可能没有任何遗漏，因为返回到您的脚本的数据很可能与您在浏览器中看到的不同。您需要证明那里返回的数据比您的脚本写入 CSV 的数据多。
网页源中只有一个元素具有第一个类名。尝试检查 chrome devtools 中所有条目的通用定位器。用通用定位器替换此定位器将导致列出items中的所有记录

标签： python csv beautifulsoup

【解决方案1】：

要加载所有产品标题、链接和价格并保存到 CSV，您可以使用以下示例：

import re
import requests
import pandas as pd
from bs4 import BeautifulSoup


url = 'https://www.micoca-cola.cl/bebidas/coca-cola'
html_doc = requests.get(url).text
page_url = 'https://www.micoca-cola.cl' + re.search(r"\.load\('(.*?)'", html_doc).group(1)

data = []
page = 1
while True:
    soup = BeautifulSoup(requests.get(page_url + str(page)).content, 'html.parser')

    if not soup.body:
        break

    for product in soup.select('.product-group'):
        title = product.h4.text
        link = product.h4.a['href'] 
        print(title)
        print(link)
        price = product.find(class_="bestPrice")
        price = price.get_text(strip=True) if price else 'Out of Stock'
        print(price)
        print('-' * 80)

        data.append({
            'title': title,
            'link': link,
            'price': price
        })

    page += 1

df = pd.DataFrame(data)
print(df)
df.to_csv('data.csv', index=False)

打印：

...
32                        Coca-Cola Light 6 x 591 ml.  ...    $ 5.340,00
33                       Coca-Cola Sin Azúcar 1,5 lt.  ...    $ 1.390,00
34                       Coca-Cola Sin Azúcar 2,5 lt.  ...    $ 1.890,00
35  Starter Kit Coca-Cola Light retornable 9 x 1,2...  ...   $ 10.710,00
36  Starter Kit Coca-Cola Original retornable 8 x ...  ...   $ 11.920,00
37                     Coca-Cola Original 6 x 3,0 lt.  ...   $ 13.140,00
38                Coca-Cola Energy Sin Azúcar 220 ml.  ...      $ 990,00
39  Starter Kit Coca-Cola Sin Azúcar retornable re...  ...    $ 1.490,00
40  Starter Kit Coca-Cola Sin Azúcar retornable 1,...  ...    $ 1.190,00
41                            Coca-Cola Light 2,5 lt.  ...    $ 1.890,00
42                            Coca-Cola Light 1,5 lt.  ...    $ 1.390,00
43                   Coca-Cola Sin Azúcar 6 x 250 ml.  ...    $ 2.290,00
44                         Coca-Cola Original 1,5 lt.  ...    $ 1.390,00
45                         Coca-Cola Original 3,0 lt.  ...    $ 2.190,00
46                     Coca-Cola Original 6 x 591 ml.  ...    $ 5.340,00
47  Starter Kit Coca-Cola Original retornable 9 x ...  ...   $ 10.710,00
48  Starter Kit Coca-Cola Light retornable 8 x 2,0...  ...   $ 11.920,00
49  Starter Kit Coca-Cola Original retornable 2,0 ...  ...    $ 1.490,00
50  Starter Kit Coca-Cola Light retornable retorna...  ...    $ 1.190,00
51                            Coca-Cola Light 3,0 lt.  ...    $ 2.190,00
52                     Coca-Cola Original 6 x 250 ml.  ...    $ 2.290,00
53  Starter Kit Coca-Cola Light retornable retorna...  ...    $ 1.490,00
54  Starter Kit Coca-Cola Original retornable 1,25...  ...    $ 1.190,00
55                         Coca-Cola Original 2,5 lt.  ...    $ 1.890,00
56                         Coca-Cola Original 1,0 lt.  ...      $ 990,00
57                            Coca-Cola Light 1,0 lt.  ...  Out of Stock

[58 rows x 3 columns]

并保存data.csv（来自 LibreOffice 的屏幕截图）：

【讨论】：

这完全解决了我的问题。它工作得很好。它比我的例子更先进，所以我会试着理解它。