我会建议一种方法接受要解析的字符串、起始平衡符号之前的字符串、字符分隔符和包含或排除分隔符(标记)的标志。
见Java IDEONE demo:
public static List<String> getBalancedStr(String s, String strBefore, Character markStart,
Character markEnd, Boolean includeMarkers) {
Matcher m = Pattern.compile("(?=(\\b\\Q" + strBefore + markStart.toString() + "\\E.*))").matcher(s);
List<String> subTreeList = new ArrayList<String>();
while (m.find()) {
int level = 0;
int lastOpenBracket = -1;
for (int i = 0; i < m.group(1).length(); i++) {
char c = m.group(1).charAt(i);
if (c == markStart) {
level++;
if (level == 1) {
lastOpenBracket = (includeMarkers ? i : i + 1);
}
}
else if (c == markEnd) {
if (level == 1) {
if (includeMarkers) {
subTreeList.add(strBefore + m.group(1).substring(lastOpenBracket, i + 1));
} else {
subTreeList.add(m.group(1).substring(lastOpenBracket, i));
}
break;
}
level--;
}
}
}
return subTreeList;
}
示例用法:
String s = "2*-5+ sin(1.5*4)+(28- 3^4-(cos(3+(19*3)+1+(6/2))/2+tan(1+cos(1+9))-6/3+2.3*3.3345)+1)+1)-(4/2)";
System.out.println("cos: " + getBalancedStr(s, "cos", '(', ')', true));
// cos: [cos(3+(19*3)+1+(6/2)), cos(1+9)]
System.out.println("sin: " + getBalancedStr(s, "sin", '(', ')', true));
// sin: [sin(1.5*4)]
System.out.println("tan: " + getBalancedStr(s, "tan", '(', ')', true));
// tan: [tan(1+cos(1+9))]
请注意,该方法编译一个正则表达式 - "(?=(\\b\\Q" + strBefore + markStart.toString() + "\\E.*))" - 将匹配 cos 或 sin 仅作为整个单词(因为 \b 是单词边界)并且 .* 将匹配队伍的尽头。如果要支持多行输入,请在前面使用(?s):"(?s)\\b\\Q" + strBefore + markStart.toString() + "\\E.*"。由于该模式位于未锚定正向前瞻内的捕获组中,因此我们收集所有重叠匹配,并且在每次匹配时只会获得 1 个平衡子字符串(因为在我们找到了相应的匹配结束分隔符后,我们打破了for循环。