【问题标题】:Having trouble insert data from beautifulsoup4 into a sqlite3 db将beautifulsoup4中的数据插入sqlite3数据库时遇到问题
【发布时间】:2021-03-22 05:18:30
【问题描述】:

尽管能够将所有信息提取到 .csv 文件中,但我无法让我的 sqlite3 数据库填充任何信息。我对 sqlite3 还很陌生,但在过去的几天里一直试图让它工作但没有成功。这段代码来自我观看关于使用 beautifulsoup4 将结果导出到 csv 的 youtube 视频。

它允许我在 sqlite3 中创建所需的列,但不会使用我用 beautifulsoup4 隔离的信息填充它。我不确定我是否正确地将信息调用到 sqlite3。我尝试了多种方法来导入数据,但都没有成功。

    #Make a request to the ebay.com get a page
#collect data from each detail page
#collect all links to detail pages of each product
# write scraped data to a csv file

import requests
from bs4 import BeautifulSoup
import csv
import sqlite3


def get_page(url):
    response = requests.get(url)
    
    if not response.ok:
        print('Server Responded', response.status_code)
    else:
        soup = BeautifulSoup(response.text, 'lxml')

    return soup
    #print(response.ok)
    #print(response.status_code)


def get_detail_data(soup):
    #title
    try:
        title = soup.find('h1', id='itemTitle').get_text().replace('Details about', '').strip()   
    except:
        title = ''
    #price 
    try:
        price = soup.find('span', id='prcIsum').get_text()    
    except:
        price = ''
    #department    
    try:
        department = soup.find('span', itemprop="name").text.strip().split(' ')
    except:
        department = ''
    #seller   
    try:
        seller = soup.find('span', class_='mbg-nw').get_text()
    except:
        seller = ''
        
#Having difficulty pulling the "data" into my db
    data = {
        'title':title,
        'price': price,
        'department': department,
        'seller': seller
    }
    
    return data

def get_index_data(soup):

    try:
        links = soup.find_all('a', class_='s-item__link')

    except:
        links = []

    urls = [item.get('href') for item in links]
    return urls

#disabled csv function below
#def write_csv(data, url):

    #with open('output.csv', 'a') as csvfile:
      #  writer = csv.writer(csvfile)

      #  row = [data['title'], data['price'], data['seller'], data['department'], url]
       # writer.writerow(row)
        
def main():
    url = 'https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1313&_nkw=colonel+blade+-coon&_sacat=0&LH_TitleDesc=0&_odkw=colonel+blade'
    

    product = get_index_data(get_page(url))

    for link in product:
        data = get_detail_data(get_page(link))
        write_csv(data, link)

if __name__ == '__main__':
    main()
    
    #having difficulty importing beautifulsoup data into db
conn =sqlite3.connect("ultimate_ebay_scraper4.db")
c = conn.cursor()
c.execute('''drop table if exists ultimate_ebay_scraper4''')
c.execute('''CREATE TABLE ultimate_ebay_scraper4(title TEXT, price TEXT, seller TEXT, department TEXT, url TEXT)''')

conn.execute("""INSERT INTO ultimate_ebay_scraper4(title, price, seller, department, url)VALUES(data['title'], data['price'], data['seller'], data['department'], url)""")
conn.commit()
cursor.close()
connector.close()

【问题讨论】:

    标签: python sqlite beautifulsoup python-requests


    【解决方案1】:

    我看到以下行似乎是从您的字典中读取数据的错误格式:

    conn.execute("""INSERT INTO ultimate_ebay_scraper4(title, price, seller, department, url)VALUES(data['title'], data['price'], data['seller'], data['department'], url)""")
    

    试试这样的:

    cur.execute("insert into ultimate_ebay_scraper4(title, price, seller, department, url) values (?, ?,?,?,?)", (data['title'], data['price'],data['seller'],data['department'],url ))
    

    您需要进行测试,但我希望您得到数据字典的字符串文字,而不是值本身。

    另外,我预计当您尝试插入数据变量时,它已经超出范围。创建一个名为 insert_data(...) 的函数并将其放入循环中,如下所示:

    for link in product:
            data = get_detail_data(get_page(link))
            write_csv(data, link)
            insert_data(data) # something like this
    

    【讨论】:

      【解决方案2】:

      如果我理解了这个问题,您想在插入查询中引用您的数据变量吗?然后你需要参数化执行调用

      请参考example in the docs

      execute("INSERT... VALUES(?, ...)",  (data['title'],... ))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-11-19
        • 2014-08-04
        • 2010-10-17
        • 1970-01-01
        • 1970-01-01
        • 2014-07-16
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多