【问题标题】:Problem extracting data from json format to csv using BeautifulSoup 3使用 BeautifulSoup 3 将数据从 json 格式提取到 csv 的问题
【发布时间】:2019-06-02 04:50:49
【问题描述】:

我正在尝试以 CSV 格式从 json 格式导出数据,但没有得到任何结果。 下面是代码

import requests
from bs4 import BeautifulSoup
import json
import re

url = "https://www.daraz.pk/catalog/?q=dell&_keyori=ss&from=input&spm=a2a0e.home.search.go.35e34937qjElRf"



page = requests.get(url)

print(page.status_code)
print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())

alpha = soup.find_all('script',{'type':'application/ld+json'})
jsonObj =`json.loads(alpha[1].text)`

for item in jsonObj['itemListElement']:
    name = item['name']
    price = item['offers']['price']
    currency = item['offers']['priceCurrency']
    availability = item['offers']['availability'].split('/')[-1]
    availability = [s for s in re.split("([A-Z][^A-Z]*)", availability) if s]
    availability = ' '.join(availability)

    print('Availability: %s  Price: %0.2f %s   Name: %s' %(availability,float(price), currency,name))

这是我试图以 CSV 格式导出数据但没有以 CSV 格式获得结果的代码

创建要写入的文件,添加标题行

outfile = open('products.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["name", "offers", "price", "priceCurrency", "availability" ])
outfile.close()
alpha = soup.find_all('script',{'type':'application/ld+json'})

jsonObj = json.loads(alpha[1].text)

for item in jsonObj['itemListElement']:
    name = item['name']
    price = item['offers']['price']
    currency = item['offers']['priceCurrency']
    availability = item['offers']['availability'].split('/')[-1]
    availability = [s for s in re.split("([A-Z][^A-Z]*)", availability) if s]
    availability = ' '.join(availability)

【问题讨论】:

  • 本页没有script type='application/ld+json'标签
  • 所有产品数据,即名称、价格、货币、可用性都在脚本中。
  • 试试下面的网址:daraz.pk/catalog/…

标签: python-3.x web-scraping beautifulsoup export-to-csv


【解决方案1】:

我个人是 Pandas 编写 csv 的粉丝。有人可能会说它广泛。但它有效。

import requests
from bs4 import BeautifulSoup
import json
import re
import pandas as pd

url = "https://www.daraz.pk/catalog/?q=dell&_keyori=ss&from=input&spm=a2a0e.home.search.go.35e34937qjElRf"



page = requests.get(url)

#print(page.status_code)
#print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
#(soup.prettify())

alpha = soup.find_all('script',{'type':'application/ld+json'})
jsonObj = json.loads(alpha[1].text)


results = pd.DataFrame()
for item in jsonObj['itemListElement']:
    name = item['name']
    price = item['offers']['price']
    currency = item['offers']['priceCurrency']
    availability = item['offers']['availability'].split('/')[-1]
    availability = [s for s in re.split("([A-Z][^A-Z]*)", availability) if s]
    availability = ' '.join(availability)

    row = [name,price,currency,availability]
    temp_df = pd.DataFrame([row], columns = ['name','price','currency','availability'])

    results = results.append(temp_df)

results.to_csv('products.csv', index=False)

【讨论】:

  • Pandas 库非常广泛,您的解决方案也很有效 :)
【解决方案2】:

你没有得到结果,因为没有在循环中写入 CSV

outfile = open('products.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["name", "type", "price", "priceCurrency", "availability" ])

alpha = soup.find_all('script',{'type':'application/ld+json'})

jsonObj = json.loads(alpha[1].text)

for item in jsonObj['itemListElement']:
    name = item['name']
    type = item['@type']
    price = item['offers']['price']
    currency = item['offers']['priceCurrency']
    availability = item['offers']['availability'].split('/')[-1]
    # forgot this?
    writer.writerow([name, type, price, currency, availability ])

# and close the CSV here
outfile.close()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-04-24
    • 2015-01-21
    • 1970-01-01
    • 1970-01-01
    • 2020-09-09
    • 2012-06-10
    • 1970-01-01
    • 2018-02-12
    相关资源
    最近更新 更多