Java Collections——不重复的单词数答案

【问题标题】：Java Collections-the number of words without repetitionJava Collections——不重复的单词数
【发布时间】：2022-01-22 16:06:25
【问题描述】：

我想创建一个方法，我们将 text(String) 作为输入参数。该方法将返回不重复的单词数。例如：“猫、狗、猫、鸟、猴” 返回值：4

如何将每个 Collections 项目相互比较？我已经拥有的：

public class WordsCounter {
public static void main(String[] args) {
    uniqueWordsCounter("cat, dog, Cat, Bird, monkey");
}

public static void uniqueWordsCounter(String text) {

    String processedText = text.toLowerCase().replaceAll(",", "");
    String[] words = processedText.split("\\s");
    List<String> wordsList = Arrays.asList(words);
}
}

【问题讨论】：

标签： java string collections equals compareto

【解决方案1】：

一种方法是使用stream API 中的distinct() 操作：

import java.util.*;

public class WordsCounter {
    public static void main(String[] args) {
        uniqueWordsCounter("cat, dog, Cat, Bird, monkey");
    }

    public static void uniqueWordsCounter(String text) {
        String[] words = text.toLowerCase().split(",\\s*");
        List<String> wordsList = Arrays.asList(words);
        System.out.println(wordsList);
        System.out.println("Count of distinct elements: "
                           + wordsList.stream().distinct().count());
    }
}

示例运行：

$ java Demo.java
[cat, dog, cat, bird, monkey]
Count of distinct elements: 4

注意用逗号分隔，后跟可选空格，而不是替换逗号，然后拆分，以帮助简化事情。

【讨论】：

也可以使用Arrays.stream(words).distinct().count()；不需要列表，除非您也需要它来做其他事情。
非常感谢 :)
甚至Pattern.compile(",\\s*").splitAsStream(text).map(String::toLowerCase) .distinct().count()，因为甚至不需要数组，并且在单词上调用toLowerCase也可能更有效。

【解决方案2】：

在使用分隔符 "," 分隔字符串后，您可以使用集合来跟踪字符串中存在的所有唯一元素在您的示例中，您保持 cat 和 Cat 相同（忽略大小写）。因此，您可以使用此逻辑。

public class WordsCounter {
    public static void main(String[] args) {
        int count = uniqueWordsCounter("cat,dog,Cat,Bird,monkey");
        System.out.println(count);
    }

    public static int uniqueWordsCounter(String text) {

        String str[] = text.split(",");
        Set<String> set = new HashSet<>() ; 
        for( String temp : str)
        {
            if ( !set.contains(temp.toLowerCase()))
            {
                set.add(temp);
            }
        }
        return set.size(); 
    }
}

输出是

【讨论】：

而不是检查单词是否存在于Set中，而是直接将单词添加到Set中（在将其转换为小写或大写之后）并返回集合的大小（因为Set将始终仅在其内存中保留条目)，这样可以减少 contains 方法的一些 TC。
tc of contains is O(1) only ，所以本质上它变成 tc of 0(1)*tc of add -> tc of add only
在检查set.contains(temp.toLowerCase()) 时出现逻辑错误，然后是set.add(temp); 没有 toLowerCase()。它恰好适用于示例数据，因为小写字符串首先出现，但会因"Cat,cat" 而失败。这说明了为什么您应该避免冗余操作，不是因为性能，而是当两个操作不同步时可能出现的错误。即使您需要对缺失值执行操作，也无需进行预测试，因为add 返回集合是否被修改，if(set.add(temp.toLowerCase())) { … } 会起作用。
@Holger 是的，我明白，纠正的一种方法是在集合中存储小写值。但正如你所提到的，我觉得这是更好的方法