BeautifulSoup 让“跨度”内容彼此相邻答案

【问题标题】：BeautifulSoup to get 'span' contents next to each otherBeautifulSoup 让“跨度”内容彼此相邻
【发布时间】：2020-03-10 04:07:59
【问题描述】：

HTML 的一部分如下所示。我想提取'span'标签中的内容：

from bs4 import BeautifulSoup
data = """
<section><h2>Team</h2><ul><li><ul><li><span>J36</span>—<span>John</span></li><li><span>B56</span>—<span>Bratt</span></li><li><span>K3</span>—<span>Kate</span></li></ul></li></ul></section>
... """
soup = BeautifulSoup(data, "html.parser")

classification = soup.find_all('section')[0].find_all('span')

for c in classification:
    print (c.text)

成功了：

J36
John
B56
Bratt
K3
Kate

但是想要的：

J36-John
B56-Bratt
K3-Kate

除了以下内容之外，提取内容的正确 BeautifulSoup 方法是什么？谢谢。

contents = [c.text for c in classification]

l = contents[0::2]
ll = contents[1::2]

for a in zip(l, ll):
    print ('-'.join(a))

【问题讨论】：

如果你不介意使用正则表达式，rows = [''.join(x) for x in re.findall('([A-Z0-9]+?)(—)([A-Za-z]+?)', data)]print('\n'.join(rows))
@alec，谢谢。这是一个保存到本地的html文件。所以我尝试了： HtmlFile = open("C:\\file.html", 'r', encoding='utf-8'); source_code = HtmlFile.read()。然后使用 "source_code" 应用您的行。不行……
可能是破折号的类型不匹配，因为data 中的破折号 (-) 和输出中的破折号 (-) 不同。看看这是否有效rows = ['-'.join(x) for x in re.findall('([A-Z0-9]+?).*?([A-Za-z]+?)', source_code)]。否则，我不确定文件中的任何内容是否与 data 不同。
@alec，该html文件有许多其他标签，包括其他
，所以它仍然不是正确的输出。

标签： python parsing web-scraping beautifulsoup

【解决方案1】：

你可以获得下一个兄弟标签。如果是破折号，它将与文本一起打印，否则将仅打印文本。

for c in classification:
    if c.next_sibling:
        print(c.text + str(c.next_sibling), end='')
    else:
        print(c.text)

【讨论】：