对频繁出现的词进行排序的流的选择答案

【问题标题】：Alternative of streams in sorting frequently occurred words对频繁出现的词进行排序的流的选择
【发布时间】：2021-10-10 05:29:09
【问题描述】：

所以，我有一个将字符串列表作为参数并读取它的方法。然后按频率对它们进行排序，如果单词具有相同的频率，则按字母顺序打印。（实际上也有俄语单词，它们总是在英语单词之下）。

这是一个良好输出的示例：

лицами-18
Apex-15
azet-15
xder-15
анатолю-15
андреевич-15
батальона-15
hello-13
zello-13
полноте-13

这是我的代码：

public class Words {

public String countWords(List<String> lines) {

    StringBuilder input = new StringBuilder();
    StringBuilder answer = new StringBuilder();

    for (String line : lines){
        if(line.length() > 3){
            if(line.substring(line.length() - 1).matches("[.?!,]+")){
                input.append(line.substring(0,line.length()-1)).append(" ");
            }else{
                input.append(line).append(" ");
            }
        }
    }

    String[] strings = input.toString().split("\\s");

    List<String> list = new ArrayList<>(Arrays.asList(strings));

    Map<String, Integer> unsortMap = new HashMap<>();
    while (list.size() != 0){
        String word = list.get(0);
        int freq = Collections.frequency(list, word);
        if (word.length() >= 4 && freq >= 10){
            unsortMap.put(word.toLowerCase(), freq);
        }

        list.removeAll(Collections.singleton(word));
    }
    //The Stream logic is here
    List<String> sortedEntries = unsortMap.entrySet().stream()
            .sorted(Comparator.comparingLong(Map.Entry<String, Integer>::getValue)
                    .reversed()
                    .thenComparing(Map.Entry::getKey)
            )
            .map(it -> it.getKey() + " - " + it.getValue())
            .collect(Collectors.toList());
    
    //Logic ends here

    for (int i = 0; i < sortedEntries.size(); i++) {
        if(i<sortedEntries.size()-1) {
            answer.append(sortedEntries.get(i)).append("\n");
        }
        else{
            answer.append(sortedEntries.get(i));
        }
    }

    return answer.toString();

 }
}

我的问题：目前代码运行良好，并且给出了成功的结果，但是如您所见，我正在使用流对字符串进行排序。但是，如果有其他解决方案可以在不使用流的情况下编写我的代码，我只是感兴趣。更准确地说，还有其他方法可以按频率排序字符串，然后按字母顺序（如果它们具有相同的频率），而不使用流。

【问题讨论】：

标签： java sorting dictionary collections stream

【解决方案1】：

您可以在流中执行的任何操作都可以在传统 Java 中执行。但是使用流通常会使代码更短、更简单、更易于阅读！

顺便说一句，您的代码的前半部分可以简单地替换为：

Map < String, AtomicInteger > map = new HashMap <>();
for ( String word : words ) {
    map.putIfAbsent( word , new AtomicInteger( 0 ) );
    map.get( word ).incrementAndGet();
}

您的代码的后半部分通过首先按值排序，然后按键排序来报告地图。

问题Sorting a HashMap based on Value then Key? 和Sort a Map<Key, Value> by values 中讨论了该挑战。这些答案中有一些巧妙的解决方案，例如this one by Sean。

但我宁愿保持简单。我会将我们的单词和单词计数映射转换为我们自己的自定义类的对象，每个对象都将单词和单词计数作为字段。

Java 16+ 带来了records 特性，使得这样的自定义类定义变得更加容易。记录是编写类的一种更简洁的方式，其主要目的是透明且不可变地传递数据。编译器隐式创建构造函数、getter、equals & hashCode 和 toString。

record WordAndCount (String word , int count ) {}

在 Java 16 之前，使用常规类代替 record。这是该记录单行的 33 行源代码等效项。

final class WordAndCount {
    private final String word;
    private final int count;

    WordAndCount ( String word , int count ) {
        this.word = word;
        this.count = count;
    }

    public String word () { return word; }

    public int count () { return count; }

    @Override
    public boolean equals ( Object obj ) {
        if ( obj == this ) return true;
        if ( obj == null || obj.getClass() != this.getClass() ) return false;
        var that = ( WordAndCount ) obj;
        return Objects.equals( this.word , that.word ) && this.count == that.count;
    }

    @Override
    public int hashCode () {
        return Objects.hash( word , count );
    }

    @Override
    public String toString () {
        return "WordAndCount[" + "word=" + word + ", " + "count=" + count + ']';
    }
}

我们创建一个该记录类型的对象数组，然后填充。

List<WordAndCount> wordAndCounts = new ArrayList <>(map.size()) ;
for ( String word : map.keySet() ) {
    wordAndCounts.add( new WordAndCount( word, map.get( word ).get() ) );
}

现在排序。 Comparator 接口有一些方便的工厂方法，我们可以在其中传递方法引用。

wordAndCounts.sort(
        Comparator
                .comparingInt( WordAndCount ::count )
                .reversed()
                .thenComparing( WordAndCount ::word )
);

让我们将所有代码放在一起。

package work.basil.text;

import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;

public class EngRus {
    public static void main ( String[] args ) {
        // Populate input data.
        List < String > words = EngRus.generateText(); // Recreate the original data seen in the Question.
        System.out.println( "words = " + words );

        // Count words in the input list.
        Map < String, AtomicInteger > map = new HashMap <>();
        for ( String word : words ) {
            map.putIfAbsent( word , new AtomicInteger( 0 ) );
            map.get( word ).incrementAndGet();
        }
        System.out.println( "map = " + map );

        // Report on word count, sorting first by word-count numerically and then by word alphabetically.
        record WordAndCount( String word , int count ) { }
        List < WordAndCount > wordAndCounts = new ArrayList <>( map.size() );
        for ( String word : map.keySet() ) {
            wordAndCounts.add( new WordAndCount( word , map.get( word ).get() ) );
        }
        wordAndCounts.sort( Comparator.comparingInt( WordAndCount :: count ).reversed().thenComparing( WordAndCount :: word ) );
        System.out.println( "wordAndCounts = " + wordAndCounts );
    }

    public static List < String > generateText () {
        String input = """
                лицами-18
                Apex-15
                azet-15
                xder-15
                анатолю-15
                андреевич-15
                батальона-15
                hello-13
                zello-13
                полноте-13
                """;

        List < String > words = new ArrayList <>();
        input.lines().forEach( line -> {
            String[] parts = line.split( "-" );
            for ( int i = 0 ; i < Integer.parseInt( parts[ 1 ] ) ; i++ ) {
                words.add( parts[ 0 ] );
            }
        } );
        Collections.shuffle( words );
        return words;
    }
}

运行时：

词语= [андреевич，你好，xder，батальона，лицами，полноте，анатолю，лицами，полноте，полноте，анатолю，анатолю，zello，你好，лицами，xder，батальона，顶点，xder，андреевич，анатолю，你好， xder，耳尖，xder，андреевич，лицами，zello，полноте，лицами，耳尖，батальона，zello，полноте，xder，你好，azet，батальона，zello，你好，полноте，耳尖，полноте，полноте，azet，андреевич，полноте， Apex，Анатолю，你好，Zello，Анатолю，alнатолю，alнo，андреевич，лицами，xder，您好，полноте，zello，apex，батальона，лицами，您好，亚洲琴，顶点，Анатолю，Анатолю，Zello， полноте，анатолю，耳尖，батальона，андреевич，лицами，андреевич，azet，azet，лицами，лицами，zello，azet，анатолю，xder，батальона，полноте，лицами，你好，лицами，xder，xder，лицами，zello，андреевич， батальона，лицами，андреевич，azet，полноте，你好，андреевич，лицами，你好，耳尖，батальона，你好，azet，лицами，zello，батальона，анатолю，耳尖，azet，xder，андреевич，андреевич，батальона，анатолю，батальона，一种pex, xder, azet, azet, xder, azet, анатолю, Apex, батальона, Apex, Apex, лицами, батальона, xder, батальона, 你好, андреевич, azet, zello, андрееваи

map = {андреевич=15, xder=15, zello=13, батальона=15, azet=15, лицами=18, анатолю=15, hello=13, Apex=15, полноте=13}

wordAndCounts = [WordAndCount[word=лицами, count=18], WordAndCount[word=Apex, count=15], WordAndCount[word=azet, count=15], WordAndCount[word=xder, count=15], WordAndCount[word=анатолю, count=15], WordAndCount[word=андреевич, count=15], WordAndCount[word=батальона, count=15], WordAndCount[word=hello, count=13], WordAndCount[word=zello, count=13], WordAndCount[word=полноте, count=13]]

【讨论】：

Intellij idea 无法看到它所说的记录，即无法解析符号记录
@Baron 正如我所说，正如我链接的 Java JEP 395 所说，Java 16 及更高版本。
是的，但我安装了 java 16
@Baron Records 是 Java 16 中的最终功能。大概您尚未将项目配置为针对 Java 16 进行编译。如果使用 Maven，请检查您的 POM 中的 <maven.compiler.release>（或更早的 <maven.compiler.source> ＆<maven.compiler.target>）。然后检查您的 IntelliJ 设置。不幸的是，IntelliJ 中有很多这样的设置。您必须在迷宫般的对话框中追踪每一个。 Search Stack Overflow 获取有关这些设置的帮助。
是否有另一种方法可以在没有记录的情况下实现相同的逻辑？我不太习惯录音，感觉很奇怪