拆分带重音的字符串和不带重音的查询[重复]答案

【问题标题】：Split string with accent and query without accent [duplicate]拆分带重音的字符串和不带重音的查询[重复]
【发布时间】：2021-12-12 12:31:15
【问题描述】：

我想将带重音的字符串与不带重音的查询拆分。

这是我目前的代码：

const sanitizer = (text: string): string => {
  return text
    .normalize("NFD")
    .replace(/\p{Diacritic}/gu, "")
    .toLowerCase();
};

const splitter = (text: string, query: string): string[] => {
  const regexWithQuery = new RegExp(`(${query})|(${sanitizer(query)})`, "gi");

  return text.split(regexWithQuery).filter((value) => value);
};

这是测试文件：

import { splitter } from "@/utils/arrayHelpers";

describe("arrayHelpers", () => {
  describe("splitter", () => {
    const cases = [
      {
        text: "pepe dominguez",
        query: "pepe",
        expectedArray: ["pepe", " dominguez"],
      },
      {
        text: "pépé dominguez",
        query: "pepe",
        expectedArray: ["pépé", " dominguez"],
      },
      {
        text: "pepe dominguez",
        query: "pépé",
        expectedArray: ["pepe", " dominguez"],
      },
      {
        text: "pepe dominguez",
        query: "pe",
        expectedArray: ["pe", " pe", " dominguez"],
      },
      {
        text: "pepe DOMINGUEZ",
        query: "DOMINGUEZ",
        expectedArray: ["pepe ", "DOMINGUEZ"],
      },
    ];

    it.each(cases)(
      "should return an array of strings with 2 elements [pepe, dominguez]",
      ({ text, query, expectedArray }) => {
        // When I call the splitter function
        const textSplitted = splitter(text, query);

        // Then I must have an array of two elements
        expect(textSplitted).toStrictEqual(expectedArray);
      }
    );
  });
});

问题在于第二种情况：

{
  text: "pépé dominguez",
  query: "pepe",
  expectedArray: ["pépé", " dominguez"],
}

因为经过清理的查询pepe 也是pepe，所以不在Pépé dominguez 中。我不知道在这种情况下如何实现使splitter函数返回['pépé', 'dominguez']。

我正在寻找原始文本的结果，而不是净化文本

【问题讨论】：

通常你不会删除方言，而是用其他字母替换它们。例如。 .replace('é', 'e')。 stackoverflow.com/questions/286921/…
我认为清理功能可以完成这项工作。但我不想清理结果
你会用text: "ééé"做什么？

标签： javascript regex

【解决方案1】：

我想到的唯一选择是为您的信件保留可能选项的地图，然后动态构建查询：

// Get query with each letter being one of its options
const sanitizeQuery = (query) => {
  const sanitizerMap = {
   'e': ['é']
  }

  return query
    .split('')
    .map(l => 
      sanitizerMap[l] !== undefined 
        ? `(?:${l}|${sanitizerMap[l].join('|')})` 
        : l
    )
    .join('');
}

// Split text by a sanitzed query
const splitter = (text, query) => {
  const regexWithQuery = new RegExp(`(${sanitizeQuery(query)})`, "gi");

  return text.split(regexWithQuery).filter((value) => value);
};

// Test
const query = 'pepe';
console.log('Query Regex:', sanitizeQuery(query));
console.log('Output:', splitter('pépé dominguez', query));

您可以通过将字母选项放在字符串而不是数组中来优化这一点。

提示：正则表达式中的?: 表示不会捕获结果。如果不使用，匹配的每个字母都将在输出数组中。在此处阅读更多信息：What is a non-capturing group in regular expressions?

【讨论】：