【发布时间】:2019-08-01 02:34:02
【问题描述】:
我正在尝试从表中提取数据,并使用漂亮的汤库访问这些数据。我将表格作为 html 获取,但我正在努力以可消耗的形式提取数据,因为表格本身有两列,第一列是标题,第二列是值。
这是我的代码:
html = browser.html
soup = bs(html, "html.parser")
table = soup.find("table", {"id":"productDetails_techSpec_section_1"})
table
打印表格结果:
"<table class="a-keyvalue prodDetTable" id="productDetails_techSpec_section_1" role="presentation">
<tbody><tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Part Number
</th>
<td class="a-size-base">
3885SD
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Item Weight
</th>
<td class="a-size-base">
1.83 pounds
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Product Dimensions
</th>
<td class="a-size-base">
9 x 6 x 3.5 inches
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Item model number
</th>
<td class="a-size-base">
3885SD
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Item Package Quantity
</th>
<td class="a-size-base">
1
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Number of Handles
</th>
<td class="a-size-base">
1
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Batteries Included?
</th>
<td class="a-size-base">
No
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Batteries Required?
</th>
<td class="a-size-base">
No
</td>
</tr>
</tbody></table>"
我尝试使用这行代码来访问每个标题和数据点:
headings = [table.get_text() for th in table.find("tr").find_all("th")]
print(headings)
这是我得到的回应:
['\n\n\n \tPart Number\t\n \n\n 3885SD\n \n\n\n\n Item Weight\n \n\n 1.83 pounds\n \n\n\n\n Product Dimensions\n \n\n 9 x 6 x 3.5 inches\n \n\n\n\n Item model number\n \n\n 3885SD\n \n\n\n\n Item Package Quantity\n \n\n 1\n \n\n\n\n Number of Handles\n \n\n 1\n \n\n\n\n Batteries Included?\n \n\n No\n \n\n\n\n Batteries Required?\n \n\n No\n \n\n']
我一直在研究将这些数据输入pandas dataframe 的不同方法,这是迄今为止我得到的结果。
我的问题是如何将这些数据放入数据框中,其中我的标题和值如下例所示?
【问题讨论】:
-
这应该是
headings = [th.get_text() for th in table.find("tr").find_all("th")],对吧?table.get_text()在列表理解中看起来有问题。 -
@JustinEzequiel - 嗨,谢谢您的回复。我也尝试过,结果如下: ['\n \tPart Number\t\n ']
标签: python pandas dataframe beautifulsoup