如何在 BeautifulSoup.contents 中保留空格答案

【问题标题】：How do I keep whitespace in BeautifulSoup.contents如何在 BeautifulSoup.contents 中保留空格
【发布时间】：2015-11-18 18:52:45
【问题描述】：

我在网上找到的大多数示例都显示了如何删除空格 - 但就我而言，我需要保留它.. 我有

html = "I can flip this whole thing with one hand\n               <span>D#m</span>\nThe ringleader man\n<span>A#</span>                           <span>Dm</span>                          <span>A#</span>\nI know~~~~ it's a fact that you'd rather just have some of me instead"
bs = BeautifulSoup(html, 'html.parser')
content = (unicode('').join(unicode(content) for content in bs.contents))

我希望保留空格（“html”变量包含 pre 标记的内容）——但它似乎用单个空格替换多个空格。

如何保存/获取给定 beautifulsoup 解析器的原始内容？

【问题讨论】：

string.replace("double space","single space").strip()??或 normalize-space(string) 使用 xpath 时
如果你只想要一个soup-ified html输入的原始内容，你可以通过'bs'来访问它们。这将保留空白。
我还添加了“html”变量——@n1c9，我尝试通过“bs”访问，但仍然删除了额外的空格

标签： python beautifulsoup

【解决方案1】：

html 解析器似乎只在您正在解析的内容位于

 标记中时才保留空格——在我的情况下，pre 标记已被删除。添加

html = "<pre>" + html + "</pre>"


保留空白。

【讨论】：