【问题标题】:How do I make toLowerCase() and toUpperCase() consistent across browsers如何使 toLowerCase() 和 toUpperCase() 在浏览器之间保持一致
【发布时间】:2018-11-26 19:48:29
【问题描述】:

是否存在 String.toLowerCase() 和 String.toUpperCase() 的 JavaScript polyfill 实现,或 JavaScript 中可以处理 Unicode 字符并在浏览器之间保持一致的其他方法?

背景资料

执行以下操作将在浏览器中产生不同的结果,甚至在浏览器版本之间(例如 FireFox 54 与 55):

document.write(String.fromCodePoint(223).normalize("NFKC").toLowerCase().toUpperCase().toLowerCase())

在 Firefox 55 中,它为您提供 ss,在 Firefox 54 中,它为您提供 ß

通常这很好,并且诸如 Locales 之类的机制可以处理很多您想要的情况;但是,当您需要跨平台的一致行为(例如与 等 BaaS 系统通信时)可以大大简化交互,您实际上是在客户端上处理内部数据。

【问题讨论】:

标签: google-cloud-firestore javascript unicode polyfills unicode-normalization case-folding


【解决方案1】:

请注意,这个问题似乎只影响过时的 Firefox 版本,因此除非您明确需要支持这些旧版本,否则您可以选择根本不打扰。您的示例的行为在所有现代浏览器中都是相同的(因为 Firefox 发生了变化)。这个可以验证using jsvu + eshost

$ jsvu # Update installed JavaScript engine binaries to the latest version.

$ eshost -e '"\xDF".normalize("NFKC").toLowerCase().toUpperCase().toLowerCase()'
#### Chakra
ss

#### V8 --harmony
ss

#### JavaScriptCore
ss

#### V8
ss

#### SpiderMonkey
ss

#### xs
ss

但是你问如何解决这个问题,所以让我们继续。

https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase 的第 4 步状态:

根据 Unicode 默认大小写转换算法,让 cuList 成为一个列表,其中元素是 toLowercase(cpList) 的结果。

Unicode 默认大小写转换算法section 3.13 Default Case Algorithms of the Unicode standard 中指定。

Unicode 字符的完整大小写映射是通过使用来自SpecialCasing.txt 的映射加上来自UnicodeData.txt 的映射获得的,不包括后面任何会冲突的映射。在这些文件中没有映射的任何字符都被认为映射到自身。

[…]

以下规则指定 Unicode 字符串的默认大小写转换操作。这些规则使用完整的大小写转换操作Uppercase_Mapping(C)Lowercase_Mapping(C)Titlecase_Mapping(C),以及基于大小写上下文的上下文相关映射,如表 3-17 中所述。

对于字符串X

  • R1 toUppercase(X):将X中的每个字符C映射到Uppercase_Mapping(C)
  • R2 toLowercase(X):将X中的每个字符C映射到Lowercase_Mapping(C)

这是来自SpecialCasing.txt 的示例,下面添加了我的注释:

00DF  ; 00DF   ; 0053 0073; 0053 0053;                      # LATIN SMALL LETTER SHARP S
<code>; <lower>; <title>  ; <upper>  ; (<condition_list>;)? # <comment>

这一行表示 U+00DF ('ß') 小写为 U+00DF (ß),大写为 U+0053 U+0053 (SS)。

这是来自UnicodeData.txt 的示例,下面添加了我的注释:

0041  ; LATIN CAPITAL LETTER A; Lu;0;L;;;;;N;;;; 0061   ;
<code>; <name>                ; <ignore>       ; <lower>; <upper>

这一行表示 U+0041 ('A') 小写为 U+0061 ('a')。它没有显式的大写映射,这意味着它自己大写。

这是UnicodeData.txt的另一个例子:

0061  ; LATIN SMALL LETTER A; Ll;0;L;;;;;N;; ;0041;        ; 0041
<code>; <name>              ; <ignore>            ; <lower>; <upper>

这一行表示 U+0061 ('a') 大写为 U+0041 ('A')。它没有明确的小写映射,这意味着它自己小写。

您可以编写一个脚本来解析这两个文件,读取这些示例之后的每一行,并构建小写/大写映射。然后,您可以将这些映射转换为提供符合规范的 toLowerCase/toUpperCase 功能的小型 JavaScript 库。

这似乎需要做很多工作。根据 Firefox 中的旧行为以及具体更改的内容 (?),您可能会将工作限制为 SpecialCasing.txt 中的特殊映射。 (根据您提供的示例,我假设 Firefox 55 中只有特殊的大小写发生了变化。)

// Instead of…
function normalize(string) {
  const normalized = string.normalize('NFKC');
  const lowercased = normalized.toLowerCase();
  return lowercased;
}

// …one could do something like:
function lowerCaseSpecialCases(string) {
  // TODO: replace all SpecialCasing.txt characters with their lowercase
  // mapping.
  return string.replace(/TODO/g, fn);
}
function normalize(string) {
  const normalized = string.normalize('NFKC');
  const fixed = lowerCaseSpecialCases(normalized); // Workaround for old Firefox 54 behavior.
  const lowercased = fixed.toLowerCase();
  return lowercased;
}

我编写了一个脚本来解析SpecialCasing.txt 并生成一个实现上述lowerCaseSpecialCases 功能(如toLower)以及toUpper 的JS 库。它是:https://gist.github.com/mathiasbynens/a37e3f3138069729aa434ea90eea4a3c 根据您的具体用例,您可能根本不需要toUpper 及其相应的正则表达式和映射。这是完整的生成库:

const reToLower = /[\u0130\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1FAF\u1FBC\u1FCC\u1FFC]/g;
const toLowerMap = new Map([
  ['\u0130', 'i\u0307'],
  ['\u1F88', '\u1F80'],
  ['\u1F89', '\u1F81'],
  ['\u1F8A', '\u1F82'],
  ['\u1F8B', '\u1F83'],
  ['\u1F8C', '\u1F84'],
  ['\u1F8D', '\u1F85'],
  ['\u1F8E', '\u1F86'],
  ['\u1F8F', '\u1F87'],
  ['\u1F98', '\u1F90'],
  ['\u1F99', '\u1F91'],
  ['\u1F9A', '\u1F92'],
  ['\u1F9B', '\u1F93'],
  ['\u1F9C', '\u1F94'],
  ['\u1F9D', '\u1F95'],
  ['\u1F9E', '\u1F96'],
  ['\u1F9F', '\u1F97'],
  ['\u1FA8', '\u1FA0'],
  ['\u1FA9', '\u1FA1'],
  ['\u1FAA', '\u1FA2'],
  ['\u1FAB', '\u1FA3'],
  ['\u1FAC', '\u1FA4'],
  ['\u1FAD', '\u1FA5'],
  ['\u1FAE', '\u1FA6'],
  ['\u1FAF', '\u1FA7'],
  ['\u1FBC', '\u1FB3'],
  ['\u1FCC', '\u1FC3'],
  ['\u1FFC', '\u1FF3']
]);
const toLower = (string) => string.replace(reToLower, (match) => toLowerMap.get(match));

const reToUpper = /[\xDF\u0149\u01F0\u0390\u03B0\u0587\u1E96-\u1E9A\u1F50\u1F52\u1F54\u1F56\u1F80-\u1FAF\u1FB2-\u1FB4\u1FB6\u1FB7\u1FBC\u1FC2-\u1FC4\u1FC6\u1FC7\u1FCC\u1FD2\u1FD3\u1FD6\u1FD7\u1FE2-\u1FE4\u1FE6\u1FE7\u1FF2-\u1FF4\u1FF6\u1FF7\u1FFC\uFB00-\uFB06\uFB13-\uFB17]/g;
const toUpperMap = new Map([
  ['\xDF', 'SS'],
  ['\uFB00', 'FF'],
  ['\uFB01', 'FI'],
  ['\uFB02', 'FL'],
  ['\uFB03', 'FFI'],
  ['\uFB04', 'FFL'],
  ['\uFB05', 'ST'],
  ['\uFB06', 'ST'],
  ['\u0587', '\u0535\u0552'],
  ['\uFB13', '\u0544\u0546'],
  ['\uFB14', '\u0544\u0535'],
  ['\uFB15', '\u0544\u053B'],
  ['\uFB16', '\u054E\u0546'],
  ['\uFB17', '\u0544\u053D'],
  ['\u0149', '\u02BCN'],
  ['\u0390', '\u0399\u0308\u0301'],
  ['\u03B0', '\u03A5\u0308\u0301'],
  ['\u01F0', 'J\u030C'],
  ['\u1E96', 'H\u0331'],
  ['\u1E97', 'T\u0308'],
  ['\u1E98', 'W\u030A'],
  ['\u1E99', 'Y\u030A'],
  ['\u1E9A', 'A\u02BE'],
  ['\u1F50', '\u03A5\u0313'],
  ['\u1F52', '\u03A5\u0313\u0300'],
  ['\u1F54', '\u03A5\u0313\u0301'],
  ['\u1F56', '\u03A5\u0313\u0342'],
  ['\u1FB6', '\u0391\u0342'],
  ['\u1FC6', '\u0397\u0342'],
  ['\u1FD2', '\u0399\u0308\u0300'],
  ['\u1FD3', '\u0399\u0308\u0301'],
  ['\u1FD6', '\u0399\u0342'],
  ['\u1FD7', '\u0399\u0308\u0342'],
  ['\u1FE2', '\u03A5\u0308\u0300'],
  ['\u1FE3', '\u03A5\u0308\u0301'],
  ['\u1FE4', '\u03A1\u0313'],
  ['\u1FE6', '\u03A5\u0342'],
  ['\u1FE7', '\u03A5\u0308\u0342'],
  ['\u1FF6', '\u03A9\u0342'],
  ['\u1F80', '\u1F08\u0399'],
  ['\u1F81', '\u1F09\u0399'],
  ['\u1F82', '\u1F0A\u0399'],
  ['\u1F83', '\u1F0B\u0399'],
  ['\u1F84', '\u1F0C\u0399'],
  ['\u1F85', '\u1F0D\u0399'],
  ['\u1F86', '\u1F0E\u0399'],
  ['\u1F87', '\u1F0F\u0399'],
  ['\u1F88', '\u1F08\u0399'],
  ['\u1F89', '\u1F09\u0399'],
  ['\u1F8A', '\u1F0A\u0399'],
  ['\u1F8B', '\u1F0B\u0399'],
  ['\u1F8C', '\u1F0C\u0399'],
  ['\u1F8D', '\u1F0D\u0399'],
  ['\u1F8E', '\u1F0E\u0399'],
  ['\u1F8F', '\u1F0F\u0399'],
  ['\u1F90', '\u1F28\u0399'],
  ['\u1F91', '\u1F29\u0399'],
  ['\u1F92', '\u1F2A\u0399'],
  ['\u1F93', '\u1F2B\u0399'],
  ['\u1F94', '\u1F2C\u0399'],
  ['\u1F95', '\u1F2D\u0399'],
  ['\u1F96', '\u1F2E\u0399'],
  ['\u1F97', '\u1F2F\u0399'],
  ['\u1F98', '\u1F28\u0399'],
  ['\u1F99', '\u1F29\u0399'],
  ['\u1F9A', '\u1F2A\u0399'],
  ['\u1F9B', '\u1F2B\u0399'],
  ['\u1F9C', '\u1F2C\u0399'],
  ['\u1F9D', '\u1F2D\u0399'],
  ['\u1F9E', '\u1F2E\u0399'],
  ['\u1F9F', '\u1F2F\u0399'],
  ['\u1FA0', '\u1F68\u0399'],
  ['\u1FA1', '\u1F69\u0399'],
  ['\u1FA2', '\u1F6A\u0399'],
  ['\u1FA3', '\u1F6B\u0399'],
  ['\u1FA4', '\u1F6C\u0399'],
  ['\u1FA5', '\u1F6D\u0399'],
  ['\u1FA6', '\u1F6E\u0399'],
  ['\u1FA7', '\u1F6F\u0399'],
  ['\u1FA8', '\u1F68\u0399'],
  ['\u1FA9', '\u1F69\u0399'],
  ['\u1FAA', '\u1F6A\u0399'],
  ['\u1FAB', '\u1F6B\u0399'],
  ['\u1FAC', '\u1F6C\u0399'],
  ['\u1FAD', '\u1F6D\u0399'],
  ['\u1FAE', '\u1F6E\u0399'],
  ['\u1FAF', '\u1F6F\u0399'],
  ['\u1FB3', '\u0391\u0399'],
  ['\u1FBC', '\u0391\u0399'],
  ['\u1FC3', '\u0397\u0399'],
  ['\u1FCC', '\u0397\u0399'],
  ['\u1FF3', '\u03A9\u0399'],
  ['\u1FFC', '\u03A9\u0399'],
  ['\u1FB2', '\u1FBA\u0399'],
  ['\u1FB4', '\u0386\u0399'],
  ['\u1FC2', '\u1FCA\u0399'],
  ['\u1FC4', '\u0389\u0399'],
  ['\u1FF2', '\u1FFA\u0399'],
  ['\u1FF4', '\u038F\u0399'],
  ['\u1FB7', '\u0391\u0342\u0399'],
  ['\u1FC7', '\u0397\u0342\u0399'],
  ['\u1FF7', '\u03A9\u0342\u0399']
]);
const toUpper = (string) => string.replace(reToUpper, (match) => toUpperMap.get(match));

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2015-02-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2010-10-02
    • 1970-01-01
    相关资源
    最近更新 更多