Python：试图抓取 Jumia 笔记本电脑网站答案

【问题标题】：Python:Trying to scrape Jumia laptop sitePython：试图抓取 Jumia 笔记本电脑网站
【发布时间】：2021-11-02 16:30:58
【问题描述】：

一直在尝试使用 python 和 beautifulsoup 将笔记本电脑名称和价格的 Jumia 网站抓取到 csv 文件中，但我的代码一直只返回 csv 文件中的标题。是不是我做错了什么？

这是我的代码

import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
page_url="https://www.jumia.com.ng/laptops/hp/"
uClient=requests.get(page_url).text
page_soup=BeautifulSoup(uClient, "html.parser")
containers=page_soup.findAll("div", {"class":"info"})
containers
filename="jumia.csv"
f=open(filename, "w")
headers="brand, price, \n"
f.write(headers)  
for contain in containers:
    try:
        product=contain.find("h3", {"class":"name"})
    except:
         product=none 
    try:
        cost=contain.find("div", {"class":"prc"})
    except:
        cost=none      
 f=open("jumia.csv.txt","w")
f.write("product" + "cost" + "\n")
print(product, cost)
f.close()

谢谢

【问题讨论】：

你检查答案了吗？可能是大量的javascript，所以你“不能”使用bs4。尝试将用户代理传递给您的请求： headers={'user.agent': check your user agent}

标签： python csv web-scraping beautifulsoup

【解决方案1】：

你忘了写csv 文件。我在for-loop 中添加了这一行，你的代码就像一个魅力：

f.write(f"{product.text}, {cost.text}\n")

最后编码：

import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
page_url="https://www.jumia.com.ng/laptops/hp/"
uClient=requests.get(page_url).text
page_soup=BeautifulSoup(uClient, "html.parser")
containers=page_soup.findAll("div", {"class":"info"})
filename="jumia.csv"
f=open(filename, "w")
headers="brand, price, \n"
f.write(headers)  
for contain in containers:
    try:
        product=contain.find("h3", {"class":"name"})
    except:
         product=none 
    try:
        cost=contain.find("div", {"class":"prc"})
    except:
        cost=none
    f.write(f"{product.text}, {cost.text}\n")
    print(f"{product.text}, {cost.text}\n") # add this line for check output
f.close()

输出：

Hp Stream 11 Intel Celeron D/C  4GB RAM- 64GB HDD WIN 10+ BAG, ₦ 130,000

Hp 14 AMD ATHLON SILVER 8GB RAM 1TB HDD Windows 10 + Free Mouse, ₦ 164,820

Hp Notebook 15 Touchscreen PC- Intel® Core I3- 8GB RAM 1TB HDD WIN 10 PRO, ₦ 235,000

Hp Stream 11 Intel Celeron D/C  4GB RAM- 32GB HDD WIN 10+ BAG, ₦ 130,000

Hp 15 Intel Core I5 10th Gen Touchscreen  12GB RAM 1TB HDD Windows 10 + 32GB Flash, ₦ 327,999
...

【讨论】：

我尝试了上面的代码，但它返回错误。UnicodeEncodeError: 'charmap' codec can't encode character '\u20a6' in position 63: character maps to
试试我的代码？哪一行出错了？
f.write(f"{product.text}, {cost.text}\n")
@I.T 试试我的代码并说出你的错误是什么？我改变了你的代码
也许我做错了什么。只是一个 python 初学者。