无法创建适当的选择器来解析某些信息答案

【问题标题】：Unable to create appropriate selectors to parse some information无法创建适当的选择器来解析某些信息
【发布时间】：2018-03-12 21:24:38
【问题描述】：

我使用 css 选择器在 python 中编写了一个脚本来解析网页中的一些姓名和电话号码。我创建的脚本没有给我预期的结果；相反，一些我不想要的信息也随之而来。如何纠正我的选择器，使其仅解析名称和电话号码，而不解析其他任何内容。为了您的考虑，我在底部粘贴了一个包含 html 元素的链接。提前致谢。

这是我写的：

from lxml.html import fromstring
root = fromstring(html)
for tree in root.cssselect(".cbFormTableEvenRow"):
    try:
        name = tree.cssselect(".cbFormDataCell span.cbFormData")[0].text
    except:
        name = ""
    try:
        phone = tree.cssselect(".cbFormLabel:contains('Phone Number')+td.cbFormDataCell .cbFormData")[0].text
    except:
        phone = ""
    print(name,phone)

我期望的结果：

JAYMES CARTER (402)499-8846

我得到的结果：

1840390831 
RESIDENTIAL 
JAYMES CARTER (402)499-8846
  
None 
My valuation jumped by almost $60,000 in one year. There are multiple comparable properties nearby that are much lower than my $194,300 evaluation, and a lot closer to my 2016 year evaluation of $134,400.

链接到html文件：

https://www.dropbox.com/s/64apg5cjpssd3hb/html_table.html?dl=0

【问题讨论】：

为什么不使用 xpath ？

标签： python python-3.x web-scraping css-selectors

【解决方案1】：

找到tr 元素，它是span 的祖父母，其文本是“电话号码”。从那里，获取所需项目的 td 元素，并按照从这些到其文本的层次结构。

>>> from lxml.html import fromstring
>>> root = fromstring(open('html_table.html').read())
>>> grand_parent = root.xpath('.//td[contains(text(),"Phone Number")]/..')[0]
>>> grand_parent.xpath('td[1]/span/text()')[0]
'JAYMES CARTER'
>>> grand_parent.xpath('td[5]/span/text()')[0]
'(402)499-8846'

回应评论的附录：

>>> items = grand_parent.xpath('.//span[@class="cbFormData"]/text()')
['JAYMES CARTER', '\xa0', '(402)499-8846']
>>> items = grand_parent.xpath('.//span[@class="cbFormData"]/text()')
>>> [_.replace('\xa0', '').strip() for _ in items]
['JAYMES CARTER', '', '(402)499-8846']

【讨论】：

感谢比尔·贝尔，您的回答。为此+1。问题是该网站上有超过 4000 个表格，并且有必要发出发布请求才能到达那里。所以我没有在这里粘贴链接。顺便说一句，如果你没有在你的 xpath 中使用indexing，我只能接受这个作为答案，因为我需要的数据不需要总是在同一个位置。谢谢。
好的，但我认为您需要向我们提供更多信息。您可以尝试附录中以grand_parent 开头的代码。如果电话号码都是北美模式，那么您可以选择第一项作为名称，然后寻找匹配的项目作为电话号码。