【发布时间】:2022-07-06 15:36:45
【问题描述】:
尝试从文本中正确提取所有术语。看起来当 term 在句子内并且 term 包含 () 它没有被拆分并且正则表达式找不到它。
我正在尝试正确拆分包含 () 的匹配项。所以代替这个:
["What is API(Application Programming Interface) and how to use it?"]
我正在努力解决这个问题:
["What is", "API(Application Programming Interface)", "and how to use it?"]
JSON 术语被正确提取,我得到了这个:
["JSON", "is a Javascript Object Notation"] 所以这正是我想要的,但如果是 API,我没有得到这个:
["What is", "API(Application Programming Interface)", "and how to use it?"]
我得到了这个,这不是我想要的:
["What is API(Application Programming Interface) and how to use it?"]
function getAllTextNodes(element) {
let node;
let nodes = [];
let walk = document.createTreeWalker(element,NodeFilter.SHOW_TEXT,null,false);
while (node = walk.nextNode()) nodes.push(node);
return nodes;
}
const allNodes = getAllTextNodes(document.getElementById("body"))
const terms = [
{id: 1, definition: 'API stands for Application programming Interface', expression: 'API(Application Programming Interface)'},
{id: 2, definition: 'JSON stands for JavaScript Object Notation.', expression: 'JSON'}
]
const termMap = new Map(
[...terms].sort((a, b) => b.expression.length - a.expression.length)
.map(term => [term.expression.toLowerCase(), term])
);
const regex = RegExp("\\b(" + Array.from(termMap.keys()).join("|") + ")\\b", "ig");
for (const node of allNodes) {
const pieces = node.textContent.split(regex).filter(Boolean);
console.log(pieces)
}
<div id="body">
<p>API(Application Programming Interface)</p>
<p>What is API(Application Programming Interface) and how to use it?</p>
<p>JSON is a Javascript Object Notation</p>
</div>
【问题讨论】:
-
问题/问题是?到目前为止,您尝试过什么来自己解决这个问题? -> How do I ask a good question?
-
How do I ask a good question?:"写一个总结具体问题的标题"
-
@Andreas 对此感到抱歉。所以我创建了正则表达式来匹配
#body中的所有术语,并将每个节点正确拆分为数组。所以我唯一的问题是当术语包含()时如何正确拆分句子 -
转义正则表达式中的术语。如果你可以在字符串的开头/结尾有特殊字符,你就不能使用
\b字边界。
标签: javascript regex