【问题标题】:List of all items from queryset with BeautifulSoup使用 BeautifulSoup 列出查询集中的所有项目
【发布时间】:2017-03-07 02:56:00
【问题描述】:

我有一个带有字段的 Django 项目,带有内容(来自 QuerySet):

<p><b>Name and LastName</b><br />
Work Title<br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 1</b><br />
Work Title1 <br /><span class="text-spacer"></span>
</p>
<p><b>Name and LastName 2</b><br />
Work Title 2<br /><span class="text-spacer"></span>
</p>

但我想要这种格式的文本,带有 (-):

Name and LastName - Work Title
Name and LastName 2 - Work Title 2
Name and LastName 3 - Work Title 3

这是我的代码,但我只得到第一个项目,但我想拥有包含项目的数组:

text_list = self.texts.filter(code='ON')
for i in text_list:
    soup = BeautifulSoup(i.text_en, "html.parser")
    aa = soup.p.get_text(separator=" - ", strip=True)
return [aa]

【问题讨论】:

    标签: python html regex django beautifulsoup


    【解决方案1】:

    您需要遍历p 标记。从您提供的示例中,您可以这样尝试:

    source = """<p><b>Name and LastName</b><br />
    Work Title<br /><span class="text-spacer"></span>
    </p>
    <p><b>Name and LastName 1</b><br />
    Work Title1 <br /><span class="text-spacer"></span>
    </p>
    <p><b>Name and LastName 2</b><br />
    Work Title 2<br /><span class="text-spacer"></span>
    </p>
    """
    soup = BeautifulSoup(source, 'lxml')
    ary = [p.get_text(separator=' - ', strip=True) for p in soup.find_all('p')]
    

    ary 将是:

    [u'Name and LastName - Work Title',
     u'Name and LastName 1 - Work Title1',
     u'Name and LastName 2 - Work Title 2']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-06-27
      相关资源
      最近更新 更多