【问题标题】:How to percent-encode only some characters如何仅对某些字符进行百分比编码
【发布时间】:2022-10-21 02:45:30
【问题描述】:

SPARQL 函数 ENCODE_FOR_URI 转义输入中除未保留的 URI 字符之外的所有字符。如何更改它以忽略某些(例如用于 IRI 的非 ASCII 字符)字符?

【问题讨论】:

    标签: sparql


    【解决方案1】:

    这是一个非标准解决方案,因为它需要超出 SPARQL 规范要求的额外正则表达式支持(前瞻),但它适用于某些数据集(例如 Wikidata)。这是完整的解决方案:它还需要选择一个不应(也不能)被替换的字符(在这种情况下为_)和输入中不存在的字符(u0000 不能存储在 RDF 中,所以这是一个好选择)

      BIND("0/1&2]3%4@5_" AS ?text)
      BIND(REPLACE(?text, "[^u0001-u005Eu0060-u007F]+", "") AS ?filtered) # the characters to keep
      BIND(REPLACE(?filtered, "(.)(?=.*\1)", "", "s") AS ?shortened) # leaves only one of each character
      BIND(REPLACE(?shortened, "(.)", "_$1", "s") AS ?separated) # separates the characters via _
      BIND(CONCAT(?separated, ENCODE_FOR_URI(?separated)) AS ?encoded) # appends the encoded variant after it
      BIND(CONCAT("_([^_]*)(?=(?:_[^_]*){", STR(STRLEN(?shortened) - 1), "}_([^_]*))?") AS ?regex)
      BIND(REPLACE(?encoded, ?regex, "$1$2u0000", "s") AS ?replaced) # groups the character and replacement together, separated by u0000
      BIND(REPLACE(?shortened, "([-\]\[])", "\\$1") AS ?class) # converts the remaining characters to a valid regex class
      BIND(CONCAT(?text, "u0000", ?replaced) AS ?prepared) # appends the replacement groups after the original text
      BIND(CONCAT("([", ?class, "])(?=.*?u0000\1([^u0000]*))|u0000.*") AS ?regex2)
      BIND(REPLACE(?prepared, ?regex2, "$2", "s") AS ?result) # replaces each occurrence of the character by its replacement in the group at the end
    

    如果您事先知道精确的替换,则只需要最后 3 行来形成字符串。

    【讨论】:

      猜你喜欢
      • 2020-01-12
      • 1970-01-01
      • 2011-08-20
      • 2021-10-10
      • 2018-09-09
      相关资源
      最近更新 更多