【问题标题】:Sending data scraped with BS4 to sqlite3 database using Python使用 Python 将使用 BS4 抓取的数据发送到 sqlite3 数据库
【发布时间】:2023-03-29 06:42:01
【问题描述】:

我正在抓取不同社区的咖啡馆的名称,并希望将它们添加到数据库中的 SQLite3 表中。然而,添加到表中的只是变量cafeNames,而不是咖啡馆名称的实际列表。

我一直在搜索 SQLite 和 BS4 文档并遵循大量教程,但我似乎无法弄清楚。我将不胜感激。

获取咖啡馆名称代码

import requests
import sqlite3

def cafenames():
    url = 'https://www.broadsheet.com.au/melbourne/guides/best-cafes-thornbury'
    response = requests.get(url, timeout=5)

    soup_cafe_names = BeautifulSoup(response.content, "html.parser")
    type(soup_cafe_names)

    cafeNames = soup_cafe_names.findAll('h2', attrs={"class":"venue-title", })
    cafeNames = [ul.text.encode for ul in cafeNames]

连接数据库代码

    try:
        sqliteConnection = sqlite3.connect('anybody_database.db')
        cursor = sqliteConnection.cursor()
        print("Database created and Successfully Connected to anybody_database")

        sqlite_select_Query = "select sqlite_version();"
        cursor.execute(sqlite_select_Query)
        record = cursor.fetchall()
        print("SQLite Database Version is: ", record)
        cursor.close()

    except sqlite3.Error as error:
        print("Error while connecting to sqlite", error)

    finally:
        if (sqliteConnection):
            sqliteConnection.close()
            print("The SQLite connection is closed")

创建表格代码


    try:
        sqliteConnection = sqlite3.connect('anybody_database.db')
        sqlite_create_table_query = ''' CREATE TABLE cafes (
                                        id INTEGER PRIMARY KEY,
                                        name TEXT NOT NULL);'''
        cursor = sqliteConnection.cursor()
        print("Successfully Connected to SQLite")
        cursor.execute(sqlite_create_table_query)
        sqliteConnection.commit()
        print("SQLite table created")

        cursor.close()

    except sqlite3.Error as error:
        print("Error while creating a sqlite table", error)
    finally:
        if (sqliteConnection):
            sqliteConnection.close()
            print("sqlite connection is closed")

将咖啡馆名称添加到表代码中:


def insertVariableIntoTable(name):
    try:
        sqliteConnection = sqlite3.connect('anybody_database.db')
        cursor = sqliteConnection.cursor()
        print("Successfully Connected to SQLite")

        sqlite_insert_with_param = """INSERT INTO cafes
                            (name)
                            VALUES
                            (?)"""

        data_tuple = (name)
        cursor.execute(sqlite_insert_with_param, data_tuple)
        sqliteConnection.commit()
        print("Python Variables inserted successfully into cafes table ")

        cursor.close()


    except sqlite3.Error as error:
        print("Failed to insert data into sqlite table", error)
    finally:
        if (sqliteConnection):
            sqliteConnection.close()
            print("The SQLite connection is closed")

insertVariableIntoTable('cafeNames')

【问题讨论】:

    标签: python sqlite beautifulsoup


    【解决方案1】:

    问题在于您的 beautifulSoup 提取。

    def cafenames():
        url = 'https://www.broadsheet.com.au/melbourne/guides/best-cafes-thornbury'
        response = requests.get(url, timeout=5)
    
        soup_cafe_names = BeautifulSoup(response.content, "html.parser")
        type(soup_cafe_names)
    
        cafeNames = soup_cafe_names.findAll('h2', attrs={"class":"venue-title", })
        cafeNames = [ul.text.strip().encode() for ul in cafeNames]
    

    encode 是方法而不是属性。所以这个函数的输出如下

    [b'Prior',
     b'Rat the Cafe',
     b'Ampersand Coffee and Food',
     b'Umberto Espresso Bar',
     b'Brother Alec',
     b'Short Round',
     b'Jerry Joy',
     b'The Old Milk Bar',
     b'Little Henri',
     b'Northern Soul']
    

    除非出于某种原因,否则我宁愿不鼓励编码

    【讨论】:

    • 感谢您抽出宝贵时间提供帮助。我能问你为什么不鼓励编码吗?这对于我想要实现的目标可能是多余的,但在我开始弄清楚数据库内容之前,它已经在那个代码块上工作了,可能不再需要了......
    • 如果这回答了您的问题,请点赞并接受答案
    • 不幸的是,仍然进入表格的只是“cafeNames”:(不是上面的名字。抱歉,工作开始了,COVID19 引起了很多头痛,所以我的回复很慢。
    猜你喜欢
    • 1970-01-01
    • 2019-03-06
    • 1970-01-01
    • 2018-09-13
    • 1970-01-01
    • 1970-01-01
    • 2018-06-30
    • 2020-10-10
    • 2020-06-14
    相关资源
    最近更新 更多