【发布时间】:2020-10-09 08:33:09
【问题描述】:
我从网上下载了一些 excel xml 并尝试解析它。我尝试了许多解决方案,但没有一个有效,例如使用 xlrd、xml 解析、elementTree 或 BeautifullSoup。这是xml的样子
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Styles>
<ss:Style ss:ID="Default">
<ss:Alignment ss:Horizontal="Left"/>
</ss:Style>
<ss:Style ss:ID="wraptext">
<ss:Alignment ss:Horizontal="Left" ss:WrapText="1"/>
<ss:Font ss:Italic="1"/>
</ss:Style>
<ss:Style ss:ID="disclaimer">
<ss:Alignment ss:Vertical="Top" ss:WrapText="1"/>
<ss:Font ss:Italic="1"/>
</ss:Style>
<ss:Style ss:ID="DefaultHyperlink">
<ss:Alignment ss:Vertical="Center" ss:WrapText="1"/>
<ss:Font ss:Color="#0000FF" ss:Underline="Single" />
</ss:Style>
<ss:Style ss:ID="headerstyle">
<ss:Font ss:Bold="1" />
</ss:Style>
<ss:Style ss:ID="Date">
<ss:NumberFormat ss:Format="dd\-mmm\-yyyy"/>
</ss:Style>
<ss:Style ss:ID="Left">
<ss:Alignment ss:Horizontal="Left"/>
<ss:NumberFormat ss:Format="Standard"/>
</ss:Style>
<ss:Style ss:ID="Right">
<ss:Alignment ss:Horizontal="Right"/>
<ss:NumberFormat ss:Format="Standard"/>
</ss:Style>
</ss:Styles>
<ss:Worksheet ss:Name="Holdings">
<ss:Table>
<ss:Row>
<ss:Cell ss:StyleID="Left">
<ss:Data ss:Type="String">06-Oct-2020</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell ss:StyleID="Left">
<ss:Data ss:Type="String">iShares Russell Top 200 Value ETF</ss:Data>
</ss:Cell>
</ss:Row>
.
.
.
或者你可以下载完整的xmlhere
最终我需要将文件转换为 DataFrame,但现在我对任何解决方案持开放态度,可能先转换为 csv。 有人可以帮忙吗?
【问题讨论】:
-
向我们展示您尝试过的代码
-
import xml.etree.ElementTree as et response = requests.get(url, headers=headers) parsed = et.parse(str(response.text)) print(parsed.getroot()) -
这个
soup = BeautifulSoup(str(response.text), 'xml') workbook = [] for sheet in soup.findAll('Worksheet'): sheet_as_list = [] for row in sheet.findAll('Row'): row_as_list = [] for cell in row.findAll('Cell'): row_as_list.append(cell.Data.text) sheet_as_list.append(row_as_list) workbook.append(sheet_as_list) print(len(workbook))打印 0