网页抓取 python 2, 2答案

【问题标题】：Web scraping python 2, 2网页抓取 python 2, 2
【发布时间】：2023-03-10 17:45:01
【问题描述】：

我必须在下面的 html 代码中提取文本内容以进行 python 网页抓取，问题在于类参数所有三个变量都具有相同的类参数，所以我尝试使用不起作用的 arial-label。

2，

Property_beds = response.css('.b6a29bc0::text').extract()

结果得到两个变量“床”，“浴室”，我只想要一个变量浴室

'Property_beds'：[2,3]

但我想在 response.css() 中包含 aria-label="Baths"，我尝试使用以下代码但输出列表为空

Property_beds = response.css('span.b6a29bc0aria-label[attribute="Beds"]::text').extract()

【问题讨论】：

请添加您要解析的 html 的一些代码和示例/sn-p。
? Property_beds = response.css('span.b6a29bc0[aria-label=Beds]::text').extract()
这是我正在尝试收集废品的网站 [bayut.com/to-rent/property/dubai/] 如果数据具有相同的类
嗨，Harr，感谢您的回答帮助我解决了这个问题，但最后，在某些变量上没有 aria-label = bed 但有 aria-label= **Studio，所以我需要提供多个 aria-label 像这样 Property_baths = response.css('span.b6a29bc0[aria-label=[Beds,Studio]:: text').extract() 但这不起作用。请让我知道如何给出多个 arial-labels。
我使用下面的美丽汤正确处理了它 property_beds = soup.findAll('span',{'class':'b6a29bc0','aria-label':['Beds', 'Studio']}) 但是当我尝试使用scrapy时它不起作用。

【解决方案1】：

单身

Property_beds = response.css('span.b6a29bc0[aria-label=Beds]::text').extract()

对于多个节点使用 css 或语法：

response.css('span.b6a29bc0[aria-label=Beds], span.b6a29bc0[aria-label=Studio]').getall()

【讨论】：

感谢代码按预期工作的响应
嗨，哈尔，如果输入类似于 1' **你的代码**(response.css ('span.b6a29bc0[aria-label=Beds]::text').extract() 工作正常，但是当输入类似于 4,155 ' 你的代码 Property_feet = response.css('span.b6a29bc0[aria-label=Area]').getall() 不起作用，这是因为在 aria-label="Area" 之后还有另一个跨度，这会导致问题并且无法将该区域输出到字段
不确定我是否理解。为什么不在答案中使用第一个版本并为区域换床？
问题出在代码上，我需要阅读 '5,000 ' 并获取5000 列出，在这个特定的列表跨度打开两次，但你的代码只读取一次，有没有办法读取 aria-label="Area"> 之后的跨度并获取值5000 列出，我正在尝试下面的代码 Property_feet = response.css('span.b6a29bc0[aria-label=Area]','span.b6a29bc0[aria-label=Area]').extract() 但没有得到要列出的值
你好，因为忙于工作，可能会更容易打开一个新问题。