【问题标题】：Return a substring using a regExp [Java]使用 regExp [Java] 返回子字符串
【发布时间】：2019-10-10 13:59:32
【问题描述】：

我需要实现一个函数，给定一个文件名作为输入，根据正则表达式的规范返回一个子字符串

文件名是这样组成的，我需要把字符串加粗

Doc20191001119049_fotocontargasx_3962122_943000.jpg

Doc201810011052053_照片AssicurazioneCartaceo_3962128_943000.jpg

Doc201910011214020_fotoesterna_ant_396024_947112.jpg

Doc201710071149010_foto_TargaMid_4007396_95010.jpg

我目前已经实现了这个：

Pattern rexExp = Pattern.compile("_[a-zA-Z0-9]+_");

但不能正常工作

【问题讨论】：

试试Pattern rexExp = Pattern.compile("_\\w+_(?=\\d{7}_)");
不知道不同的位数是否是拼写错误，但对于您发布的所有示例，这适用于 ^Doc[\d+]{14,15}([^\d]+)[\d]{6,7}_[\d]{5,6}\.jpg$...

标签： java android regex regexp-substr

【解决方案1】：

解决方案 1：匹配/提取

您可以在_s 中捕获\w+ 模式，然后是[digits][_][digits][.][extension]：

Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");

见regex demo

详情

_ - 下划线
(\w+) - 1+ 个字母/数字/_
_ - 下划线
\d+ - 1 位以上
_\d+ - _ 和 1+ 位
\. - 一个点
[^.]* - 除. 之外的 0+ 个字符
$ - 字符串结束。

Online Java demo:

String s = "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg";
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
Matcher matcher = rexExp.matcher(s);
if (matcher.find()){
    System.out.println(matcher.group(1)); 
} // => fotoAssicurazioneCartaceo

解决方案 2：删除不必要的前缀/后缀

您可以删除从开始到第一个 _ 包括它的所有内容，以及最后的 [digits][_][digits][.][extension]：

.replaceAll("^[^_]*_|_\\d+_\\d+\\.[^.]*$", "")

见this regex demo

详情

^[^_]*_ - 字符串的开头，除 _ 之外的 0+ 个字符，然后是 _
| - 或
_\d+_\d+\.[^.]*$ - _，1+ 位，_，1+ 位，.，然后是除. 之外的 0+ 个字符到字符串的末尾。

【讨论】：

你需要使用replaceAll，而不是replaceFirst；否则，您将获得fotoAssicurazioneCartaceo_3962128_943000.jpg 的Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg 而不是fotoAssicurazioneCartaceo。

【解决方案2】：

为了补充 Wiktor 的精确 answer，这里有一个“快速而肮脏”的方法，它对您的输入做出以下 hacky 假设：“必需的字符串只是非数字，被数字包围，并且输入始终是有效的文件路径”。

public static void main(String[] args) {
  String[] strs = {"Doc20191001119049_fotocontargasx_3962122_943000.jpg", "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg", "Doc201910011214020_fotoesterna_ant_396024_947112.jpg", "Doc201710071149010_foto_TargaMid_4007396_95010.jpg"};
  var p = Pattern.compile("_([\\D_]+)_");
  for(var str : strs) {
    var m = p.matcher(str);
    if(m.find()) {
      System.out.println("found: "+m.group(1));
    }
  }
}

输出：

found: fotocontargasx
found: fotoAssicurazioneCartaceo
found: fotoesterna_ant
found: foto_TargaMid

【讨论】：

【解决方案3】：

模式：(?<=_).+(?=(_\d+){2}\.)

    final String s = "Doc20191001119049_fotocontargasx_3962122_943000.jpg\n"
        + "\n"
        + "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg\n"
        + "\n"
        + "Doc201910011214020_fotoesterna_ant_396024_947112.jpg\n"
        + "\n"
        + "Doc201710071149010_foto_TargaMid_4007396_95010.jpg";
    Pattern pattern = Pattern.compile("(?<=_).+(?=(_\\d+){2}\\.)");
    Matcher matcher = pattern.matcher(s);
    List<String> allMatches = new ArrayList<>();

    while (matcher.find()) {
        allMatches.add(matcher.group());
    }

输出：[fotocontargasx, fotoAssicurazioneCartaceo, fotoesterna_ant, foto_TargaMid]

【讨论】：