使用 BeautifulSoup 从网站导入表格答案

【问题标题】：Import table from website with BeautifulSoup使用 BeautifulSoup 从网站导入表格
【发布时间】：2019-05-26 18:53:43
【问题描述】：

我正在尝试从网站导入表格，然后将数据转换为 pandas 数据框。

网址是：https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

到目前为止，这就是我的代码：

import numpy as np 
import pandas as pd 
import requests
from bs4 import BeautifulSoup

website_url = requests.get(
'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text 

soup = BeautifulSoup(website_url,'lxml')

My_table = soup.find('table',{'class':'wikitable sortable'})

for x in soup.find_all('table',{'class':'wikitable sortable'}):
    table = x.text


print(My_table)
print(table)

Output of print(My_table)

Output of print(table)

如何将此网页表格转换为熊猫数据框？ panda dataframe

【问题讨论】：

在此处以 read_html stackoverflow.com/questions/55566117/… 的形式回答，不确定是否会因为您只需要表格而使其重复。解决方案的第一部分仍然适用。

标签： python pandas beautifulsoup

【解决方案1】：

你试过了吗

pd.read_html()

还有，既然表格很标准，为什么不直接把表格复制到excel中，作为DataFrame导入呢？

【讨论】：

嘿，Gen，谢谢您的回答。 read_html() 帮助很大，但由于某种原因它不包含 Neighborhood 列
pd.read_html(r'en.wikipedia.org/wiki/… = '邻居')
奇怪，对我来说结果相同：df = pd.read_html(r'en.wikipedia.org/wiki/…'Neighbourhood') print(df)
知道了，谢谢！ df = pd.read_html(r'en.wikipedia.org/wiki/…'邻居') type(df) len(df) df = df[0] df