美汤：刮表数据答案

【问题标题】：Beautiful Soup:Scrape Table Data美汤：刮表数据
【发布时间】：2018-09-01 15:33:48
【问题描述】：

我希望从下面的 url 中提取表格数据。具体来说，我想提取第一列中的数据。当我运行下面的代码时，第一列中的数据会重复多次。如何让值在表格中只显示一次？

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page3.html').read()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'id':'giftList'})

rows = table.find_all('tr')

for row in rows:
    data = row.find_all('td')
    for cell in data:
        print(data[0].text)

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup python-requests

【解决方案1】：

将requests 模块与selectors 结合使用，您也可以尝试如下：

import requests
from bs4 import BeautifulSoup

link = 'http://www.pythonscraping.com/pages/page3.html'

soup = BeautifulSoup(requests.get(link).text, 'lxml')
for table in soup.select('table#giftList tr')[1:]:
    cell = table.select_one('td').get_text(strip=True)
    print(cell)

输出：

Vegetable Basket
Russian Nesting Dolls
Fish Painting
Dead Parrot
Mystery Box

【讨论】：

【解决方案2】：

试试这个：

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page3.html').read()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'id':'giftList'})

rows = table.find_all('tr')

for row in rows:
    data = row.find_all('td')

    if (len(data) > 0):
        cell = data[0]
        print(cell.text)

【讨论】：

如何仅从第一列中提取数据？
@Zach 你想打印每一行的第一列吗？
@奥兹。是的。我只想要数据中的第一列。
@Zach 我已经编辑了代码。这能满足您的需要吗？如果是，请接受答案。
完美！正是我需要的。