将抓取的数据附加到 JSON 文件答案

【问题标题】：Appending to a scraped data to a JSON file将抓取的数据附加到 JSON 文件
【发布时间】：2019-05-11 04:25:03
【问题描述】：

我正在尝试从抓取的数据中创建一个 json 文件。但是，基于我的函数 converToJson() 它会一直覆盖上一个条目而不是附加。是因为我没有迭代它吗？例如：下面的 Json 文件每次都会用新数据覆盖第一个条目，而不是附加到它。

[{"Volume": "Volume:\n6,061,086", "Price": "$ 41.88", "Name": "Suncor Energy Inc."}]

def getStockDetails(url, browser):
    
        print(url)
        browser.get(url)
    
        quote_wrapper = browser.find_element_by_css_selector('div.quote-wrapper')
        quote_name = quote_wrapper.find_element_by_class_name(
            "quote-name").find_element_by_tag_name('h2').text
        quote_price = quote_wrapper.find_element_by_class_name("quote-price").text
        quote_volume = quote_wrapper.find_element_by_class_name(
            "quote-volume").text
    
        print("\n")
        print("Quote Name: " + quote_name)
        print("Quote Price: " + quote_price)
        print("Quote Volume: " + quote_volume)
        print("\n")
    
        convertToJson(quote_name,quote_price,quote_volume)
 
  
 def convertToJson(quote_name,quote_price,quote_volume):
        quotesArr = []
        quoteObject = {
            "Name": quote_name,
            "Price": quote_price,
            "Volume": quote_volume
        }
        quotesArr.append(quoteObject)

        with open('trendingQuoteData.json', 'w') as outfile:
            json.dump(quotesArr, outfile)

【问题讨论】：

标签： python json selenium-webdriver web-scraping beautifulsoup

【解决方案1】：

您需要将变量quotesArr 设为全局变量，将其放在函数之外并在完成后写入您的json。

quotesArr = []
def convertToJson(quote_name,quote_price,quote_volume):
    quoteObject = {
        "Name": quote_name,
        "Price": quote_price,
        "Volume": quote_volume
    }
    quotesArr.append(quoteObject)

def trendingBot(url, browser):
    browser.get(url)
    trending = getTrendingQuotes(browser)
    for trend in trending:
        getStockDetails(trend, browser)
    # requests finished, write json to file
    with open('trendingQuoteData.json', 'w') as outfile:
        json.dump(quotesArr, outfile)

【讨论】：

哟，你帮了我大忙哈哈！
我还有一些其他问题.. 我想将我抓取的信息输出到我稍后将创建的网站上，创建此 JSON 文件是否适合执行此操作？另外我希望我的刮刀每 6 小时刮一次，这可能吗？
因为我喜欢你的问题：D
或者我应该提出一个全新的问题并在那里提问？
对于网站提要，我将直接抓取 ajax 请求，请参阅 here 以了解详细信息页面中的趋势和元素，例如 quote-name 可以使用 beautifulsoup 或正则表达式抓取。

【解决方案2】：

import json

a = json.loads(jsonStringA)
b = json.loads(jsonStringB)
c = dict(a.items() + b.items())
# or c =  dict(a, **b)

【讨论】：