【问题标题】:Can I get all the text between every <p> and </p> in a certain ID我可以获取某个 ID 中每个 <p> 和 </p> 之间的所有文本吗
【发布时间】:2017-06-27 13:15:09
【问题描述】:

我使用了下面的代码,但它只能让我得到一个之间的文本,当它应该让我得到 5 &lt;p&gt;&lt;/p&gt; 之间的文本时

>             var myHTMLString = try String(contentsOf: myURL, encoding: .ascii)
            while let idRange = myHTMLString.range(of: "post-51"){
                myHTMLString=myHTMLString.substring(from: idRange.upperBound)
                if let paraRange = myHTMLString.range(of: "<p>"){
                    myHTMLString=myHTMLString.substring(from: paraRange.upperBound)
                    if let paraCloseRange = myHTMLString.range(of: "</p>"){
                        HTMLData = myHTMLString.substring(to: paraCloseRange.lowerBound)
                        textViewer.text = HTMLData
                        myHTMLString = myHTMLString.substring(from: paraCloseRange.upperBound)

                    }else{
                        //Handle paragraph close tag not found
                        print("Handle paragraph close tag not found")
                    }
                }else{
                    //Handle paragraph start tag not found
                    print("Handle paragraph start tag not found")
                }
            }

完整的 HTML 字符串是:`

<!-- main content -->
<div id="content" class="main-content-inner col-sm-12 col-md-9">
        <header>
        <h1 class="page-title">Community</h1>
    </header>
<article id="post-51" class="post-51 page type-page status-publish hentry">
    <!-- .entry-header -->
    <div class="entry-content">
        <h1>Your Experience, Your Programs</h1>
<p>The Purdue Honors College is dedicated to providing meaningful opportunities to enhance the honors student experience. We are building an interdisciplinary community of scholars by adding value through specialized programming and events that are connected to our pillars. The Honors College strives to create an environment in which every student can feel connected, learn, and grow as they each pursue greatness. To reach your full potential in the Honors College, students should attend at least three honors programs per semester outside of the regular curriculum requirements. We invite you to be a part of one of our many upcoming events as we ignite the imagination of our community and forge the future of our college.</p>
<hr />
<h3>Events Calendar</h3>
<p>The Honors College hosts events to keep students engaged with their peers and the Honors College faculty.</p>
<p><a href="https://honors.purdue.edu/community/calendar/">Click here to learn more about upcoming events in the Honors College.</a></p>
<hr />
<h3>Honors College and Residences</h3>
<p>The new 324,000-square-foot Honors College and Residences is the first of its kind in the state of Indiana. It encourages scholarship and connects students with faculty while being emblematic of the Mission of the Purdue Honors College: from the locally sourced building materials to LEED certification and interactive learning spaces.</p>
<p><a href="https://honors.purdue.edu/community/honors-college-and-residences/">Click here to learn more about the new Honors College and Residences buildings.</a></p>
<hr />
<h3>Honors Network News</h3>
<p><a href="https://honors.purdue.edu/community/honors-network-news/">Click here to view the Honors Network News archive.</a></p>
<hr />
<h3>News</h3>
<p>Stay up to date with news about the Honors College. Learn about the awesome things our students are doing and follow the Honors College on social media.</p>
<p><a href="https://honors.purdue.edu/community/news/">Click here to view more news about the Honors College.</a></p>
<hr />
<h3>Photo Gallery</h3>
<p><a href="https://honors.purdue.edu/community/photo-gallery/">Click here to view photos of Honors College events.</a></p>
<hr />
<h3>Published Works</h3>
<p><a href="https://honors.purdue.edu/community/published-works/">Click here to view the published works of the Honors College.</a></p>
<hr />
<h3>Signature Programs</h3>
<p><a href="https://honors.purdue.edu/community/signature-programs/">Click here to learn more about Signature Programs from the Honors College.</a></p>
<hr />
            </div><!-- .entry-content -->
    </article><!-- #post-## -->
`

【问题讨论】:

  • 是的。你可以,试试 $("#p id ").text() 使用 jquery
  • 这很快。我认为 jquery 不适用于 Swift。我说的对吗?
  • 糟糕,我没注意到。
  • 你要解析html页面吗?
  • @SergeyDi 我试图从位于post-51 的 HTML ID 下的网页中获取文本。有多个&lt;p&gt;&lt;/p&gt;,我想从它们中获取文本并打印出来

标签: html swift parsing text textview


【解决方案1】:

把你的代码改成这个循环遍历所有

找到 id 后。看看我的评论,在满足特定条件后打破 while 循环非常重要。
 var myHTMLString = try String(contentsOf: myURL, encoding: .ascii)
if let idRange = myHTMLString.range(of: "post-51"){
        myHTMLString=myHTMLString.substring(from: idRange.upperBound)
        while let paraRange = myHTMLString.range(of: "<p>"){
            myHTMLString=myHTMLString.substring(from: paraRange.upperBound)
            if let paraCloseRange = myHTMLString.range(of: "</p>"){
                HTMLData = myHTMLString.substring(to: paraCloseRange.lowerBound)
                textViewer.text = HTMLData
                //AFTER YOU GET THE NEEDED INFORMATION, DO A break HERE to get out of while loop or you will loop through all <p>
                myHTMLString = myHTMLString.substring(from: paraCloseRange.upperBound)

            }else{
                //Handle paragraph close tag not found
                print("Handle paragraph close tag not found")
            }
        }
    }else{
        print("Handle id not found")
    }

【讨论】:

  • 它只经过一次while循环。错误也没有执行
  • 更新它正在通过 While 循环 5 次,但错误说没有找到段落开始标记。我已经浏览了 html 及其那里
  • @JunaidJaved 你能在你的问题中发布完整的 html 字符串吗?
  • 我添加了我想要的文本所在的大部分 HTML。 Stack Overflow 有字数限制,所以我不能把所有的 HTML 代码都放进去。我注意到 ID 是内容。我已经更新了代码,但仍然发生同样的事情
  • @JunaidJaved 我只在您的代码中看到 ID 后 51 一次。我虽然您告诉我在 html 中有几个 id post-51 并且您希望在每个之后都有

    ?你真的是说在唯一的 POST-51 之后有多个

    吗?

【解决方案2】:

我认为使用 off-screen Web 视图来临时加载 HTML 并检索您所追求的内容并不是不可能的。以下是如何做到这一点的示例:

class ViewController: UIViewController {

    // Declared as a property of the class to ensure it is not freed
    // from memory (because we're not adding it to the view hierarchy).
    let webView = UIWebView()

    override func viewDidLoad() {
        super.viewDidLoad()

        webView.delegate = self
        webView.loadHTMLString("<html><head></head><body><div id=\"hello\"><p>First</p><p>Second</p><p>Third</p></div></body></html>", baseURL: nil)
    }
}

extension ViewController: UIWebViewDelegate {
    func webViewDidFinishLoad(_ webView: UIWebView) {

        let result = webView.stringByEvaluatingJavaScript(from: "Array.prototype.slice.call(document.getElementById('hello').getElementsByTagName('p')).map(function(p) { return p.innerHTML }).join('|')")
        print(result)
    }
}

请注意stringByEvaluatingJavaScript 无法处理数组响应,因此我们将p 标记的内容与管道| 字符连接起来以将其返回给Swift。然后,您可以拆分管道上的字符串以获取数组。您可以将分隔符更改为您确定不会自然出现在 p 标记中的任何内容。

另外,Array.prototype.slice.call只是将getElementsByTagName返回的HTMLCollection转换成一个数组。

【讨论】:

  • 我想过制作一个 webView,但我所在的大学不希望它感觉像一个网络浏览器。有没有办法我可以使用代码但没有对用户可见的 webView
  • 对用户来说是不可见的,看代码,我不是把网页视图加到屏幕上,它只是加载到内存中,从不显示。
  • 这似乎适用于我很快创建的测试项目。如何摆脱“|”在每个

    标签结束后出现

  • 您希望将所有文本连接在一起吗?如果是这样,请删除 .join('|')
  • 我摆脱了它们,但它仍然在控制台中显示一些我不想要的文本。例如:&lt;a&gt; some link to a website &lt;/a&gt;
猜你喜欢
  • 2021-10-09
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-03-11
  • 1970-01-01
  • 2014-11-26
  • 2020-01-03
  • 2017-03-28
相关资源
最近更新 更多