Beautiful Soup 解析 XML 文件答案

【问题标题】：Beautiful Soup parsing an XML fileBeautiful Soup 解析 XML 文件
【发布时间】：2018-10-11 09:57:57
【问题描述】：

我正在编写一个简单的 Python，使用 Beautiful Soup 从 xml 文件中解析出我需要的数据。它正在按我的需要工作，但我有一个问题要问你们，因为我试图用谷歌搜索，但似乎找不到我要找的东西。

XML 字符串示例：

<ProductAttribute MaintenanceType="C" AttributeID="Attachment Type" PADBAttribute="N" RecordNumber="1" LanguageCode="EN">Clamp-On</ProductAttribute>

我需要 ProductAttribute 中的 AttributeID。当我写的时候，下面我可以获取值“Clamp-On”，但我需要 AttributeID 来告诉我 Clamp-On 引用的是什么。

attributes[part.find('PartNumber').get_text()] = [x.get_text() for x in part.find_all('ProductAttribute')]

for key, value in attributes.items():
     for v in value:
     print(v)

在负面反馈之前，我们感谢任何指导。谢谢！

【问题讨论】：

也许 x.get_text() 绕过了标签级属性
可以按键选择属性-docs
谢谢各位，正如你们评论的那样，我刚刚偶然发现了这一点。

标签： python beautifulsoup

【解决方案1】：

仅使用 lxml 库的简单解决方案：

from lxml import etree

xml_string = """<ProductAttribute MaintenanceType="C" AttributeID="Attachment Type" PADBAttribute="N" RecordNumber="1" LanguageCode="EN">Clamp-On</ProductAttribute>"""

xml = etree.XML(xml_string)
print(xml.get("AttributeID"))

输出：

Attachment Type

【讨论】：

【解决方案2】：

这是你如何使用 BeautifulSoup 和 lxml 从 xml 中获取标签属性的方法，

from bs4 import BeautifulSoup

xml_string = '<ProductAttribute MaintenanceType="C" AttributeID="Attachment Type" PADBAttribute="N" RecordNumber="1" LanguageCode="EN">Clamp-On</ProductAttribute>'

soup = BeautifulSoup(xml_string, 'xml')
tag = soup.ProductAttribute
print(tag['AttributeID'])

这段代码打印属性AttributeID的值

【讨论】：