从网站上抓取数据并将其存储在数组中答案

【问题标题】：Scrape the data from website and store it in an array从网站上抓取数据并将其存储在数组中
【发布时间】：2018-05-11 11:19:39
【问题描述】：

这是使用python从websource中提取的xml或html web数据，它是表格格式，我希望只将** **标记的数据放在一个数组中作为[][]如何做?单个数组也可以一一存储。

我的想法是将符号 BHEL 及其值 80.50 作为一个单独的元素，以便我可以将其用于我的编码。

<table width="100%"><tr><td>
<div class="tphead"><h2>Option Chain (Equity Derivatives)</h2></div>
</td><td align="right">
<div style="float:right; font-size:1.2em;">
<span>**Underlying Stock:** <b style="font-size:1.2em;">**BHEL** **80.50**</b> </span>
<span>**As on May 11, 2018 15:30:30 IST**<a> <img onclick="refresh();" src="/live_market/resources/images/refressbtn.gif" style="cursor: pointer" title="refresh"/></a></span></div>
</td></tr></table>

我只想过滤这些数据，并将其一个一个地存储为一个数组。

数组应如下所示。任何python代码支持都可以在这里提供。

Option Chain (Equity Derivatives)
Underlying Stock: BHEL 80.50
As on
May 11, 2018
15:30:30 IST

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup

【解决方案1】：

不清楚您需要什么，但看起来您想使用 BeautifulSoup4 获取 HTML 标记中的文本。

from bs4 import BeautifulSoup

extracted_text = []
soup = BeautifulSoup(your_string, 'html.parser')
for tag in soup.find_all(recursive=False):
    text = tag.text.strip()
    if text:
        extracted_text.append(text)

your_string 是您获取的 html 代码

recursive=False 用于在嵌套的 HTML 标记上仅向下一层，否则它将提取相同的文本两次（或更多）

【讨论】：

不，这行不通；我有很多其他文本数据作为结果输出，我不想将其放入数组中。只有上面给出的提取物必须放入数组