【问题标题】:Problem with the scraped Bulgarian language text in excel using bs4使用 bs4 在 excel 中刮掉保加利亚语文本的问题
【发布时间】:2021-03-21 17:30:12
【问题描述】:
  1. 我正在尝试抓取一个包含保加利亚语文本的网站。它已成功抓取,但是当我将其存储到 CSV 文件中时,它不可读。请查看以下代码和图像以更好地理解我的问题。

     res = requests.get('https://m.mobile.bg/results? 
     pubtype=1&marka=Toyota&currency=%D0%BB%D0%B2.&sort=1&nup=0~1')
    
     soup = bs4.BeautifulSoup(res.text, 'lxml')
     file = open('cars.csv', 'w')
     writer = csv.writer(file)
    
     # write title row
     writer.writerow(['Car_Make', 'Price', 'info', 'date'])
     for i in soup.select('.listItem'):
    
    
     car_make = i.find('div', attrs = {"class":"title"})
    
     arr = i.text
     print(arr)
    
     writer.writerow([arr.encode('utf-8')])
    
     file.close()
    

The output in jupyter notebook is as follows. I want this to be stored as it is in csv file

This is how the output looks like in a CSV file

【问题讨论】:

  • 在支持的情况下尝试使用 utf-8-sig
  • utf-8-sig 没有解决问题。
  • 非常感谢,@barny。我不知道术语,因为这是我第一次做这样的任务。感谢您清除术语。

标签: python-3.x web-scraping beautifulsoup export-to-csv


【解决方案1】:
import requests
import csv
from bs4 import BeautifulSoup


def main(url):
    params = {
        "pubtype": "1",
        "marka": "Toyota",
        "currency": "лв.",
        "sort": "1",
        "nup": "0~1"
    }
    r = requests.get(url, params=params)
    soup = BeautifulSoup(r.text, 'lxml')
    with open('d.csv', 'w', newline='', encoding='utf-8-sig') as f:
        writer = csv.writer(f)
        writer.writerows([list(x.strings)
                          for x in soup.select('.listItem.TOPitem')])


main('https://m.mobile.bg/results')

输出:

【讨论】:

    猜你喜欢
    • 2023-03-07
    • 2021-11-11
    • 2019-09-03
    • 2015-09-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多