拆分方法与子字符串和 IndexOf答案

【问题标题】：Split Method vs Substring and IndexOf拆分方法与子字符串和 IndexOf
【发布时间】：2016-05-25 02:42:11
【问题描述】：

所以我正在编写一个解析 CSV 的程序。我正在使用 split 方法将值分隔到一个字符串数组中，但我在一些文章中读到使用 substring 和 indexOf 更快。我基本上写了我会用这两种方法做什么，看起来拆分会更好。有人可以解释一下这如何更好，或者我是否没有正确使用这些方法？这是我写的：

int indexOne = 0, indexTwo;
for (int i = 0; i < 4; i++) //there's four diff values in one line
{
   if (line.indexOf(",", indexOne) != -1)
   {
       indexTwo = line.indexOf(",", indexOne);
       lineArr[i] = line.substring(indexOne, indexTwo);
       indexOne = indexTwo+1;
   }
}

【问题讨论】：

你能链接其中的一些文章吗？
考虑使用 lodash 或下划线或类似不处理此类事情。
@AustinD 这是一个链接 demeranville.com/… 有人把它放在 stackexchange 的评论中，这是那个线程 programmers.stackexchange.com/questions/221997/…
@Michael 你的意思是在 CSV 文件中吗？
@AustinD 我并不惊讶 split() 会更快。这是一个反复重复的常见操作，当 Java 最终添加它时，我倾向于相信他们使用的是优化算法。情况并非总是如此，但我倾向于在为我创建轮子时不重新发明轮子，除非有具体证据表明它效率低下。

标签： java parsing csv

【解决方案1】：

下面的代码取自 Oracles JDK 8 update 73 附带的源代码。正如您在“快速路径”场景中看到的，当您传入一个单字符字符串时，它会陷入使用 indexOf 的循环，类似于您的逻辑。

简短的回答是，是的，您的代码要快一点，但我会让您决定这是否足以避免在您的用例中使用拆分。

我个人倾向于同意@pczeus 评论使用拆分，除非您确实有证据表明它会导致问题。

 public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.value.length == 1 &&
         ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
         (regex.length() == 2 &&
          regex.charAt(0) == '\\' &&
          (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
          ((ch-'a')|('z'-ch)) < 0 &&
          ((ch-'A')|('Z'-ch)) < 0)) &&
        (ch < Character.MIN_HIGH_SURROGATE ||
         ch > Character.MAX_LOW_SURROGATE))
    {
        int off = 0;
        int next = 0;
        boolean limited = limit > 0;
        ArrayList<String> list = new ArrayList<>();
        while ((next = indexOf(ch, off)) != -1) {
            if (!limited || list.size() < limit - 1) {
                list.add(substring(off, next));
                off = next + 1;
            } else {    // last one
                //assert (list.size() == limit - 1);
                list.add(substring(off, value.length));
                off = value.length;
                break;
            }
        }
        // If no match was found, return this
        if (off == 0)
            return new String[]{this};

        // Add remaining segment
        if (!limited || list.size() < limit)
            list.add(substring(off, value.length));

        // Construct result
        int resultSize = list.size();
        if (limit == 0) {
            while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                resultSize--;
            }
        }
        String[] result = new String[resultSize];
        return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
}

【讨论】：