【问题标题】:strange string range behavior using NSRegularExpression matches使用 NSRegularExpression 匹配的奇怪字符串范围行为
【发布时间】:2016-10-26 07:52:46
【问题描述】:

我正在尝试解析原始 HTTP 响应,但在尝试将 NSRange 转换为 Range 时,我得到的范围不正确。这是来自游乐场的相关代码:

public extension NSRange {
    public func toStringRange(_ str: String) -> Range<String.Index>? {
        guard str.characters.count >= length - location  && location < str.characters.count else { return nil }
        let fromIdx = str.characters.index(str.startIndex, offsetBy: self.location)
        print("from: \(self.location) = \(fromIdx)")
        let toIdx = str.characters.index(fromIdx, offsetBy: self.length)
        return fromIdx..<toIdx
    }
}

let responseString = "HTTP/1.0 200 OK\r\nContent-Length: 193\r\nContent-Type: application/json\r\n"
let responseRange = NSRange(location: 0, length: responseString.characters.count)
let responseRegex = try! NSRegularExpression(pattern: "^(HTTP/1.\\d) (\\d+) (.*?\r\n)(.*)", options: [.anchorsMatchLines])
guard let matchResult = responseRegex.firstMatch(in: responseString, options: [], range: responseRange),
    matchResult.numberOfRanges == 5,
    let versionRange = matchResult.rangeAt(1).toStringRange(responseString),
    let statusRange = matchResult.rangeAt(2).toStringRange(responseString),
    let headersRange = matchResult.rangeAt(4).toStringRange(responseString)
    else { fatalError() }

toStringRange() 中打印的输出是

from: 0 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 0), _countUTF16: 1)
from: 9 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 9), _countUTF16: 1)
from: 17 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 18), _countUTF16: 1)

为什么第三个 toStringRange() 调用返回一个从 18 而不是 17 开始的字符串范围?

【问题讨论】:

    标签: regex swift nsregularexpression


    【解决方案1】:

    您从NSRangeRange&lt;String.Index&gt; 的转换方法不 对于外部的扩展字素簇和字符正常工作 “基本的多语言平面”(表情符号、旗帜等)。

    NSRange 计数 UTF-16 代码点(对应于unichar NSString 中的表示。 Range&lt;String.Index&gt; 数斯威夫特 Characters 代表扩展字素簇。

    在您的具体情况下,"\r\n" 算作两个 UTF-16 代码点,但是 作为单个Character,这会导致不必要的“转变”。

    这是一个简化的例子:

    let responseString = "OK\r\nContent-Length"
    
    let nsRange = (responseString as NSString).range(of: "Content")
    print(nsRange.location, nsRange.length) // 4 7
    
    if let sRange1 = nsRange.toStringRange(responseString) {
        print(responseString.substring(with: sRange1)) // "ontent-"
    }
    

    使用方法

    extension String {
        func range(from nsRange: NSRange) -> Range<String.Index>? {
            guard
                let from16 = utf16.index(utf16.startIndex, offsetBy: nsRange.location, limitedBy: utf16.endIndex),
                let to16 = utf16.index(from16, offsetBy: nsRange.length, limitedBy: utf16.endIndex),
                let from = String.Index(from16, within: self),
                let to = String.Index(to16, within: self)
                else { return nil }
            return from ..< to
        }
    }
    

    NSRange to Range<String.Index> 你会得到预期的结果:

    if let sRange2 = responseString.range(from: nsRange) {
        print(responseString.substring(with: sRange2)) // "Content"
    }
    

    【讨论】:

      猜你喜欢
      • 2018-06-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多