【问题标题】:How do I decode HTML entities in Swift?如何在 Swift 中解码 HTML 实体?
【发布时间】:2014-10-25 18:22:48
【问题描述】:

我正在从一个站点提取一个 JSON 文件,收到的字符串之一是:

The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi

如何将&#8216 之类的内容转换为正确的字符?

我制作了一个 Xcode Playground 来演示它:

import UIKit

var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)

let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary

var a = dataDictionary["posts"] as NSArray

println(a[0]["title"])

【问题讨论】:

    标签: json swift html-entities


    【解决方案1】:

    此答案是针对 Swift 5.2 和 iOS 13.4 SDK 最后修订的。


    没有直接的方法可以做到这一点,但您可以使用NSAttributedString 魔法使此过程尽可能轻松(请注意,此方法也会删除所有 HTML 标记)。

    记住只从主线程初始化NSAttributedString。它使用 WebKit 来解析下面的 HTML,因此需要。

    // This is a[0]["title"] in your case
    let encodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"
    
    guard let data = htmlEncodedString.data(using: .utf8) else {
        return
    }
    
    let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
        .documentType: NSAttributedString.DocumentType.html,
        .characterEncoding: String.Encoding.utf8.rawValue
    ]
    
    guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
        return
    }
    
    // The Weeknd ‘King Of The Fall’
    let decodedString = attributedString.string
    
    extension String {
    
        init?(htmlEncodedString: String) {
    
            guard let data = htmlEncodedString.data(using: .utf8) else {
                return nil
            }
    
            let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: String.Encoding.utf8.rawValue
            ]
    
            guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
                return nil
            }
    
            self.init(attributedString.string)
    
        }
    
    }
    
    let encodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"
    let decodedString = String(htmlEncodedString: encodedString)
    

    【讨论】:

    • 什么?扩展意味着扩展现有类型以提供新功能。
    • 我明白你想说什么,但否定扩展不是要走的路。
    • @akashivskyy:要让非 ASCII 字符正常工作,您必须添加一个 NSCharacterEncodingDocumentAttribute,比较 stackoverflow.com/a/27898167/1187415
    • 此方法极其繁重,不建议在tableviews或gridviews中使用
    • 这太棒了!虽然阻塞了主线程,但是有什么办法可以在后台线程中运行呢?
    【解决方案2】:

    @akashivskyy 的回答很棒,它演示了如何利用NSAttributedString 来解码 HTML 实体。一种可能的缺点 (正如他所说)是 all HTML 标记也被删除了,所以

    <strong> 4 &lt; 5 &amp; 3 &gt; 2</strong>
    

    变成

    4 < 5 & 3 > 2
    

    在 OS X 上,CFXMLCreateStringByUnescapingEntities() 可以完成这项工作:

    let encoded = "<strong> 4 &lt; 5 &amp; 3 &gt; 2 .</strong> Price: 12 &#x20ac;.  &#64; "
    let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
    println(decoded)
    // <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €.  @ 
    

    但这在 iOS 上不可用。

    这是一个纯 Swift 实现。它解码字符实体 像 &amp;lt; 这样的引用使用字典和所有数字字符 &amp;#64&amp;#x20ac 等实体。 (请注意,我没有列出所有 252 个 HTML 实体。)

    斯威夫特 4:

    // Mapping from XML/HTML character entity reference to character
    // From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
    private let characterEntities : [ Substring : Character ] = [
        // XML predefined entities:
        "&quot;"    : "\"",
        "&amp;"     : "&",
        "&apos;"    : "'",
        "&lt;"      : "<",
        "&gt;"      : ">",
    
        // HTML character entity references:
        "&nbsp;"    : "\u{00a0}",
        // ...
        "&diams;"   : "♦",
    ]
    
    extension String {
    
        /// Returns a new string made by replacing in the `String`
        /// all HTML character entity references with the corresponding
        /// character.
        var stringByDecodingHTMLEntities : String {
    
            // ===== Utility functions =====
    
            // Convert the number in the string to the corresponding
            // Unicode character, e.g.
            //    decodeNumeric("64", 10)   --> "@"
            //    decodeNumeric("20ac", 16) --> "€"
            func decodeNumeric(_ string : Substring, base : Int) -> Character? {
                guard let code = UInt32(string, radix: base),
                    let uniScalar = UnicodeScalar(code) else { return nil }
                return Character(uniScalar)
            }
    
            // Decode the HTML character entity to the corresponding
            // Unicode character, return `nil` for invalid input.
            //     decode("&#64;")    --> "@"
            //     decode("&#x20ac;") --> "€"
            //     decode("&lt;")     --> "<"
            //     decode("&foo;")    --> nil
            func decode(_ entity : Substring) -> Character? {
    
                if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
                    return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
                } else if entity.hasPrefix("&#") {
                    return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
                } else {
                    return characterEntities[entity]
                }
            }
    
            // ===== Method starts here =====
    
            var result = ""
            var position = startIndex
    
            // Find the next '&' and copy the characters preceding it to `result`:
            while let ampRange = self[position...].range(of: "&") {
                result.append(contentsOf: self[position ..< ampRange.lowerBound])
                position = ampRange.lowerBound
    
                // Find the next ';' and copy everything from '&' to ';' into `entity`
                guard let semiRange = self[position...].range(of: ";") else {
                    // No matching ';'.
                    break
                }
                let entity = self[position ..< semiRange.upperBound]
                position = semiRange.upperBound
    
                if let decoded = decode(entity) {
                    // Replace by decoded character:
                    result.append(decoded)
                } else {
                    // Invalid entity, copy verbatim:
                    result.append(contentsOf: entity)
                }
            }
            // Copy remaining characters to `result`:
            result.append(contentsOf: self[position...])
            return result
        }
    }
    

    例子:

    let encoded = "<strong> 4 &lt; 5 &amp; 3 &gt; 2 .</strong> Price: 12 &#x20ac;.  &#64; "
    let decoded = encoded.stringByDecodingHTMLEntities
    print(decoded)
    // <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €.  @
    

    斯威夫特 3:

    // Mapping from XML/HTML character entity reference to character
    // From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
    private let characterEntities : [ String : Character ] = [
        // XML predefined entities:
        "&quot;"    : "\"",
        "&amp;"     : "&",
        "&apos;"    : "'",
        "&lt;"      : "<",
        "&gt;"      : ">",
    
        // HTML character entity references:
        "&nbsp;"    : "\u{00a0}",
        // ...
        "&diams;"   : "♦",
    ]
    
    extension String {
    
        /// Returns a new string made by replacing in the `String`
        /// all HTML character entity references with the corresponding
        /// character.
        var stringByDecodingHTMLEntities : String {
    
            // ===== Utility functions =====
    
            // Convert the number in the string to the corresponding
            // Unicode character, e.g.
            //    decodeNumeric("64", 10)   --> "@"
            //    decodeNumeric("20ac", 16) --> "€"
            func decodeNumeric(_ string : String, base : Int) -> Character? {
                guard let code = UInt32(string, radix: base),
                    let uniScalar = UnicodeScalar(code) else { return nil }
                return Character(uniScalar)
            }
    
            // Decode the HTML character entity to the corresponding
            // Unicode character, return `nil` for invalid input.
            //     decode("&#64;")    --> "@"
            //     decode("&#x20ac;") --> "€"
            //     decode("&lt;")     --> "<"
            //     decode("&foo;")    --> nil
            func decode(_ entity : String) -> Character? {
    
                if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
                    return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
                } else if entity.hasPrefix("&#") {
                    return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
                } else {
                    return characterEntities[entity]
                }
            }
    
            // ===== Method starts here =====
    
            var result = ""
            var position = startIndex
    
            // Find the next '&' and copy the characters preceding it to `result`:
            while let ampRange = self.range(of: "&", range: position ..< endIndex) {
                result.append(self[position ..< ampRange.lowerBound])
                position = ampRange.lowerBound
    
                // Find the next ';' and copy everything from '&' to ';' into `entity`
                if let semiRange = self.range(of: ";", range: position ..< endIndex) {
                    let entity = self[position ..< semiRange.upperBound]
                    position = semiRange.upperBound
    
                    if let decoded = decode(entity) {
                        // Replace by decoded character:
                        result.append(decoded)
                    } else {
                        // Invalid entity, copy verbatim:
                        result.append(entity)
                    }
                } else {
                    // No matching ';'.
                    break
                }
            }
            // Copy remaining characters to `result`:
            result.append(self[position ..< endIndex])
            return result
        }
    }
    

    斯威夫特 2:

    // Mapping from XML/HTML character entity reference to character
    // From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
    private let characterEntities : [ String : Character ] = [
        // XML predefined entities:
        "&quot;"    : "\"",
        "&amp;"     : "&",
        "&apos;"    : "'",
        "&lt;"      : "<",
        "&gt;"      : ">",
    
        // HTML character entity references:
        "&nbsp;"    : "\u{00a0}",
        // ...
        "&diams;"   : "♦",
    ]
    
    extension String {
    
        /// Returns a new string made by replacing in the `String`
        /// all HTML character entity references with the corresponding
        /// character.
        var stringByDecodingHTMLEntities : String {
    
            // ===== Utility functions =====
    
            // Convert the number in the string to the corresponding
            // Unicode character, e.g.
            //    decodeNumeric("64", 10)   --> "@"
            //    decodeNumeric("20ac", 16) --> "€"
            func decodeNumeric(string : String, base : Int32) -> Character? {
                let code = UInt32(strtoul(string, nil, base))
                return Character(UnicodeScalar(code))
            }
    
            // Decode the HTML character entity to the corresponding
            // Unicode character, return `nil` for invalid input.
            //     decode("&#64;")    --> "@"
            //     decode("&#x20ac;") --> "€"
            //     decode("&lt;")     --> "<"
            //     decode("&foo;")    --> nil
            func decode(entity : String) -> Character? {
    
                if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
                    return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
                } else if entity.hasPrefix("&#") {
                    return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
                } else {
                    return characterEntities[entity]
                }
            }
    
            // ===== Method starts here =====
    
            var result = ""
            var position = startIndex
    
            // Find the next '&' and copy the characters preceding it to `result`:
            while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
                result.appendContentsOf(self[position ..< ampRange.startIndex])
                position = ampRange.startIndex
    
                // Find the next ';' and copy everything from '&' to ';' into `entity`
                if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
                    let entity = self[position ..< semiRange.endIndex]
                    position = semiRange.endIndex
    
                    if let decoded = decode(entity) {
                        // Replace by decoded character:
                        result.append(decoded)
                    } else {
                        // Invalid entity, copy verbatim:
                        result.appendContentsOf(entity)
                    }
                } else {
                    // No matching ';'.
                    break
                }
            }
            // Copy remaining characters to `result`:
            result.appendContentsOf(self[position ..< endIndex])
            return result
        }
    }
    

    【讨论】:

    • 这太棒了,谢谢马丁!这是带有完整 HTML 实体列表的扩展:gist.github.com/mwaterfall/25b4a6a06dc3309d9555 我还稍微调整了它以提供替换所产生的距离偏移。这允许正确调整可能受这些替换影响的任何字符串属性或实体(例如 Twitter 实体索引)。
    • @MichaelWaterfall 和 Martin 这太棒了!奇迹般有效!我更新了 Swift 2 的扩展 pastebin.com/juHRJ6au 谢谢!
    • 我将此答案转换为与 Swift 2 兼容,并将其转储到名为 StringExtensionHTML 的 CocoaPod 中以方便使用。请注意,Santiago 的 Swift 2 版本修复了编译时错误,但完全删除 strtooul(string, nil, base) 将导致代码无法使用数字字符实体并在遇到无法识别的实体时崩溃(而不是优雅地失败) .
    • @AdelaChang:实际上我已经在 2015 年 9 月将我的答案转换为 Swift 2。它仍然可以在没有警告的情况下使用 Swift 2.2/Xcode 7.3 进行编译。还是您指的是迈克尔的版本?
    • 谢谢,有了这个答案,我解决了我的问题:我在使用 NSAttributedString 时遇到了严重的性能问题。
    【解决方案3】:

    斯威夫特 4


    • 字符串扩展计算变量
    • 没有额外的守卫,做,抓,等等......
    • 如果解码失败则返回原始字符串

    extension String {
        var htmlDecoded: String {
            let decoded = try? NSAttributedString(data: Data(utf8), options: [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil).string
    
            return decoded ?? self
        }
    }
    

    【讨论】:

    • 哇!适用于 Swift 4 的开箱即用!用法 // let encoded = "The Weeknd ‘King Of The Fall’"让 finalString = encoded.htmlDecoded
    • 我喜欢这个答案的简单性。但是,它在后台运行时会导致崩溃,因为它试图在主线程上运行。
    【解决方案4】:

    Swift 3 版本的@akashivskyy's extension

    extension String {
        init(htmlEncodedString: String) {
            self.init()
            guard let encodedData = htmlEncodedString.data(using: .utf8) else {
                self = htmlEncodedString
                return
            }
    
            let attributedOptions: [String : Any] = [
                NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
            ]
    
            do {
                let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
                self = attributedString.string
            } catch {
                print("Error: \(error)")
                self = htmlEncodedString
            }
        }
    }
    

    【讨论】:

    • 效果很好。原始答案导致奇怪的崩溃。感谢更新!
    • 对于法语字符我必须使用 utf16
    【解决方案5】:

    Swift 2 版本的@akashivskyy's extension,

     extension String {
         init(htmlEncodedString: String) {
             if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
                 let attributedOptions : [String: AnyObject] = [
                NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
            ]
    
                 do{
                     if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
                         self.init(attributedString.string)
                     }else{
                         print("error")
                         self.init(htmlEncodedString)     //Returning actual string if there is an error
                     }
                 }catch{
                     print("error: \(error)")
                     self.init(htmlEncodedString)     //Returning actual string if there is an error
                 }
    
             }else{
                 self.init(htmlEncodedString)     //Returning actual string if there is an error
             }
         }
     }
    

    【讨论】:

    • 这段代码不完整,应该尽量避免。错误没有得到正确处理。当实际上存在错误代码时会崩溃。当出现错误时,您应该更新您的代码以至少返回 nil。或者你可以用原始字符串初始化。最后你应该处理错误。事实并非如此。哇!
    【解决方案6】:

    Swift 4 版本

    extension String {
    
        init(htmlEncodedString: String) {
            self.init()
            guard let encodedData = htmlEncodedString.data(using: .utf8) else {
                self = htmlEncodedString
                return
            }
    
            let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: String.Encoding.utf8.rawValue
            ]
    
            do {
                let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
                self = attributedString.string
            } 
            catch {
                print("Error: \(error)")
                self = htmlEncodedString
            }
        }
    }
    

    【讨论】:

    • 当我尝试使用它时,我得到“Error Domain=NSCocoaErrorDomain Code=259“无法打开文件,因为它的格式不正确。”。如果我在主线程上运行完整的 do catch,这就会消失。我通过检查 NSAttributedString 文档发现了这一点:“不应从后台线程调用 HTML 导入器(即选项字典包含值为 html 的 documentType)。它将尝试与主线程同步,失败,然后超时。”
    • 拜托,rawValue 语法 NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue)NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue) 太可怕了。将其替换为 .documentType.characterEncoding
    • @MickeDG - 你能解释一下你到底做了什么来解决这个错误吗?我偶尔会得到它。
    • @RossBarbish - 抱歉,罗斯,这是很久以前的事了,不记得细节了。您是否尝试过我在上面评论中的建议,即在主线程上运行完整的 do catch?
    【解决方案7】:

    我一直在寻找一个纯 Swift 3.0 实用程序来转义到/取消转义 HTML 字符引用(即适用于 macOS 和 Linux 上的服务器端 Swift 应用程序),但没有找到任何全面的解决方案,所以我编写了自己的实现:https://github.com/IBM-Swift/swift-html-entities

    包,HTMLEntities,适用于 HTML4 命名字符引用以及十六进制/十进制数字字符引用,它会根据 W3 HTML5 规范识别特殊的数字字符引用(即 &amp;#x80; 应该不转义为欧元符号 (unicode U+20AC) 而不是 U+0080 的 unicode 字符,并且某些范围的数字字符引用应在取消转义时替换为替换字符 U+FFFD

    使用示例:

    import HTMLEntities
    
    // encode example
    let html = "<script>alert(\"abc\")</script>"
    
    print(html.htmlEscape())
    // Prints ”&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"
    
    // decode example
    let htmlencoded = "&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"
    
    print(htmlencoded.htmlUnescape())
    // Prints ”<script>alert(\"abc\")</script>"
    

    以 OP 为例:

    print("The Weeknd &#8216;King Of The Fall&#8217; [Video Premiere] | @TheWeeknd | #SoPhi ".htmlUnescape())
    // prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi "
    

    编辑:HTMLEntities 现在支持从 2.0.0 版本开始的 HTML5 命名字符引用。还实现了符合规范的解析。

    【讨论】:

    • 这是最通用的答案,一直有效,不需要在主线程上运行。这甚至适用于最复杂的 HTML 转义 unicode 字符串(例如 (&amp;nbsp;͡&amp;deg;&amp;nbsp;͜ʖ&amp;nbsp;͡&amp;deg;&amp;nbsp;)),而其他答案都无法做到这一点。
    • 是的,这应该更上一层楼! :)
    • 原始答案不是线程安全的事实对于像字符串操作这样本质上低级的东西来说是一个非常大的问题
    【解决方案8】:
    extension String{
        func decodeEnt() -> String{
            let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
            let attributedOptions : [String: AnyObject] = [
                NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
            ]
            let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
    
            return attributedString.string
        }
    }
    
    let encodedString = "The Weeknd &#8216;King Of The Fall&#8217;"
    
    let foo = encodedString.decodeEnt() /* The Weeknd ‘King Of The Fall’ */
    

    【讨论】:

    • 关于“周末”:不是“周末”
    • 语法高亮看起来很奇怪,尤其是最后一行的注释部分。你能解决它吗?
    • “The Weeknd”是一位歌手,是的,他的名字就是这样拼写的。
    【解决方案9】:

    斯威夫特 4:

    最终使用 HTML 代码、换行符和单引号对我有用的整体解决方案

    extension String {
        var htmlDecoded: String {
            let decoded = try? NSAttributedString(data: Data(utf8), options: [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: String.Encoding.utf8.rawValue
                ], documentAttributes: nil).string
    
            return decoded ?? self
        }
    }
    

    用法:

    let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded
    

    然后我不得不应用更多过滤器来去除单引号(例如,don'thasn't等),以及像\n这样的换行符:

    var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
    yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)
    

    【讨论】:

    • 这本质上是this other answer 的副本。您所做的只是添加一些显而易见的用法。
    • 有人赞了这个答案,发现它真的很有用,这告诉你什么?
    • @Naishta 它告诉你,每个人都有不同的意见,没关系
    【解决方案10】:

    这将是我的方法。您可以添加来自https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555 Michael Waterfall 提及的实体字典。

    extension String {
        func htmlDecoded()->String {
    
            guard (self != "") else { return self }
    
            var newStr = self
    
            let entities = [
                "&quot;"    : "\"",
                "&amp;"     : "&",
                "&apos;"    : "'",
                "&lt;"      : "<",
                "&gt;"      : ">",
            ]
    
            for (name,value) in entities {
                newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
            }
            return newStr
        }
    }
    

    使用的例子:

    let encoded = "this is so &quot;good&quot;"
    let decoded = encoded.htmlDecoded() // "this is so "good""
    

    let encoded = "this is so &quot;good&quot;".htmlDecoded() // "this is so "good""
    

    【讨论】:

    【解决方案11】:

    优雅的 Swift 4 解决方案

    如果你想要一个字符串,

    myString = String(htmlString: encodedString)
    

    将此扩展添加到您的项目中:

    extension String {
    
        init(htmlString: String) {
            self.init()
            guard let encodedData = htmlString.data(using: .utf8) else {
                self = htmlString
                return
            }
    
            let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
               .documentType: NSAttributedString.DocumentType.html,
               .characterEncoding: String.Encoding.utf8.rawValue
            ]
    
            do {
                let attributedString = try NSAttributedString(data: encodedData,
                                                              options: attributedOptions,
                                                              documentAttributes: nil)
                self = attributedString.string
            } catch {
                print("Error: \(error.localizedDescription)")
                self = htmlString
            }
        }
    }
    

    如果你想要一个带有粗体、斜体、链接等的 NSAttributedString,

    textField.attributedText = try? NSAttributedString(htmlString: encodedString)
    

    将此扩展添加到您的项目中:

    extension NSAttributedString {
    
        convenience init(htmlString html: String) throws {
            try self.init(data: Data(html.utf8), options: [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: String.Encoding.utf8.rawValue
                ], documentAttributes: nil)
        }
    
    }
    

    【讨论】:

      【解决方案12】:

      斯威夫特 4

      我真的很喜欢使用 documentAttributes 的解决方案。但是,解析文件和/或在表格视图单元格中使用可能太慢了。我不敢相信 Apple 没有为此提供一个体面的解决方案。

      作为一种解决方法,我在 GitHub 上找到了这个字符串扩展,它运行良好并且解码速度很快。

      因此,对于给定答案会变慢的情况,请参阅此链接中建议的解决方案: https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555

      注意:它不解析 HTML 标签。

      【讨论】:

        【解决方案13】:

        @yishus' answer 的计算 var 版本

        public extension String {
            /// Decodes string with HTML encoding.
            var htmlDecoded: String {
                guard let encodedData = self.data(using: .utf8) else { return self }
        
                let attributedOptions: [String : Any] = [
                    NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                    NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]
        
                do {
                    let attributedString = try NSAttributedString(data: encodedData,
                                                                  options: attributedOptions,
                                                                  documentAttributes: nil)
                    return attributedString.string
                } catch {
                    print("Error: \(error)")
                    return self
                }
            }
        }
        

        【讨论】:

          【解决方案14】:

          斯威夫特 4

          func decodeHTML(string: String) -> String? {
          
              var decodedString: String?
          
              if let encodedData = string.data(using: .utf8) {
                  let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
                      .documentType: NSAttributedString.DocumentType.html,
                      .characterEncoding: String.Encoding.utf8.rawValue
                  ]
          
                  do {
                      decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
                  } catch {
                      print("\(error.localizedDescription)")
                  }
              }
          
              return decodedString
          }
          

          【讨论】:

          • 解释一下。例如,它与之前的 Swift 4 答案有何不同?
          【解决方案15】:

          Swift 4.1 +

          var htmlDecoded: String {
          
          
              let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
          
                  NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
                  NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
              ]
          
          
              let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
                  , documentAttributes: nil).string
          
              return decoded ?? self
          } 
          

          【讨论】:

          • 解释一下。例如,它与以前的答案有何不同?使用了哪些 Swift 4.1 功能?它是否仅适用于 Swift 4.1 而不适用于以前的版本?或者它会在 Swift 4.1 之前工作,比如在 Swift 4.0 中?
          【解决方案16】:

          斯威夫特 4

          extension String {
              var replacingHTMLEntities: String? {
                  do {
                      return try NSAttributedString(data: Data(utf8), options: [
                          .documentType: NSAttributedString.DocumentType.html,
                          .characterEncoding: String.Encoding.utf8.rawValue
                      ], documentAttributes: nil).string
                  } catch {
                      return nil
                  }
              }
          }
          

          简单用法

          let clean = "Weeknd &#8216;King Of The Fall&#8217".replacingHTMLEntities ?? "default value"
          

          【讨论】:

          • 我已经听到人们抱怨我的强制解包可选。如果你正在研究 HTML 字符串编码并且你不知道如何处理 Swift 选项,那你就太超前了。
          • 是的,有(edited Nov 1 at 22:37 使“简单用法”更难理解)
          【解决方案17】:

          更新了适用于 Swift 3 的答案

          extension String {
              init?(htmlEncodedString: String) {
                  let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
                  let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
          
                  guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
                      return nil
                  }
                  self.init(attributedString.string)
             }
          

          【讨论】:

            【解决方案18】:

            看看HTMLString - a library written in Swift that allows your program to add and remove HTML entities in Strings

            为了完整起见,我复制了网站的主要功能:

            • 为 ASCII 和 UTF-8/UTF-16 编码添加实体
            • 删除 2100 多个命名实体(如 &)
            • 支持删除十进制和十六进制实体
            • 旨在支持 Swift 扩展字素集群(→ 100% 防表情符号)
            • 完全单元测试
            • 快速
            • 记录在案
            • 与 Objective-C 兼容

            【讨论】:

            • 也很有趣,谢谢!应该更上一层楼
            【解决方案19】:

            Swift 5.1 版本

            import UIKit
            
            extension String {
            
                init(htmlEncodedString: String) {
                    self.init()
                    guard let encodedData = htmlEncodedString.data(using: .utf8) else {
                        self = htmlEncodedString
                        return
                    }
            
                    let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
                        .documentType: NSAttributedString.DocumentType.html,
                        .characterEncoding: String.Encoding.utf8.rawValue
                    ]
            
                    do {
                        let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
                        self = attributedString.string
                    } 
                    catch {
                        print("Error: \(error)")
                        self = htmlEncodedString
                    }
                }
            }
            

            另外,如果你想提取日期、图像、元数据、标题和描述,你可以使用我的 pod 命名:

            .

            Readability kit

            【讨论】:

            • 是什么使它无法在某些早期版本(Swift 5.0、Swift 4.1、Swift 4.0 等)中运行?
            • 使用collectionViews解码字符串时发现错误
            【解决方案20】:

            Objective-C

            +(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
                if (!htmlEncodedString) {
                    return nil;
                }
            
                NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
                NSDictionary *attributes = @{NSDocumentTypeDocumentAttribute:     NSHTMLTextDocumentType,
                                         NSCharacterEncodingDocumentAttribute:     @(NSUTF8StringEncoding)};
                NSAttributedString *attributedString = [[NSAttributedString alloc]     initWithData:data options:attributes documentAttributes:nil error:nil];
                return [attributedString string];
            }
            

            【讨论】:

              【解决方案21】:

              Swift 3.0 版本,实际字体大小转换

              通常,如果您直接将 HTML 内容转换为属性字符串,则会增加字体大小。您可以尝试将 HTML 字符串转换为属性字符串,然后再返回以查看差异。

              相反,这里是实际大小转换,通过对所有字体应用 0.75 比率来确保字体大小不会改变:

              extension String {
                  func htmlAttributedString() -> NSAttributedString? {
                      guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
                      guard let attriStr = try? NSMutableAttributedString(
                          data: data,
                          options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
                          documentAttributes: nil) else { return nil }
                      attriStr.beginEditing()
                      attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
                          (value, range, stop) in
                          if let font = value as? UIFont {
                              let resizedFont = font.withSize(font.pointSize * 0.75)
                              attriStr.addAttribute(NSFontAttributeName,
                                                       value: resizedFont,
                                                       range: range)
                          }
                      }
                      attriStr.endEditing()
                      return attriStr
                  }
              }
              

              【讨论】:

                【解决方案22】:

                斯威夫特 4

                extension String {
                
                    mutating func toHtmlEncodedString() {
                        guard let encodedData = self.data(using: .utf8) else {
                            return
                        }
                
                        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
                            NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
                            NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
                        ]
                
                        do {
                            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
                            self = attributedString.string
                        }
                        catch {
                            print("Error: \(error)")
                        }
                    }
                

                【讨论】:

                • 拜托,rawValue 语法 NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue)NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue) 太可怕了。将其替换为 .documentType.characterEncoding
                • 这个解决方案的性能很糟糕。单独的caes可能没问题,不建议解析文件。
                【解决方案23】:

                用途:

                NSData dataRes = (nsdata value )
                
                var resString = NSString(data: dataRes, encoding: NSUTF8StringEncoding)
                

                【讨论】:

                猜你喜欢
                • 1970-01-01
                • 1970-01-01
                • 1970-01-01
                • 1970-01-01
                相关资源
                最近更新 更多