【问题标题】:Implementing Soundex in Java在 Java 中实现 Soundex
【发布时间】:2017-04-07 10:37:35
【问题描述】:

请帮我在java中实现字符串相似度比较! 使用 org.apache.commons.codec.language.Soundex 库

Soundex soundex = new Soundex();
String phoneticValue = soundex.encode("YourString");
String phoneticValue2 = soundex.encode("YourStrink");

if(phoneticValue.equals(phoneticValue2)){

}

似乎不起作用。在字符串相似的情况下,Encode 函数会给出不同的结果。 如何将两个相似的字符串与这个库进行比较?

期待很快收到您的来信! ;)

【问题讨论】:

  • YourString != YourStrink
  • 就是这样。我需要比较实际上不相同的相似字符串
  • 这很奇怪,我很奇怪,正如您可能已经从我对 soundex 完全不熟悉的第一条评论中发现的那样,但是当我用您的代码运行一个简单的测试用例时,两个值都相等。我使用了 commons-codec v1.5
  • 两个值都被编码为 Y623。
  • 我用的和你一样,唯一的区别可能是版本(我用的是1.5版)maven依赖在这里mvnrepository.com/artifact/commons-codec/commons-codec/1.5

标签: java search soundex


【解决方案1】:
class Soundex{
private static int getConsonantCode( char ch ){
    String codeList[] = { "BFPV", "CGJKQSXZ","DT","L","MN","R" };
    int code = 0;
    for( int i = 0 ; i < codeList.length ; i++ ){
         if( codeList[i].indexOf(ch) >= 0 ) {
            code = i+1;
        }
    }
    return code;
}
private static boolean isVowel( char ch ){
    return (new String("AEIOUaeiou")).indexOf(ch) >= 0 ;
}
public static String getSoundexCode( String str ){
    str=str.toUpperCase();
    String soundexCode = "" + str.charAt(0), temp="";
    int length = str.length();
    char curr, prev, next;{ }
    String dropList = "AEIOUYHW";
    for( int i=1 ; i< length ; i++ ){
        curr = str.charAt(i);
        prev = str.charAt( i-1 );
        if( ( curr=='H' || curr == 'W') && i != length-1 ){
            if( temp.length() >= 2) temp=temp.substring(1);
            next=str.charAt( i+1 );
            if( isVowel(curr) && getConsonantCode( prev ) == getConsonantCode( next ) ){
                temp += prev+prev;
                i=i+1;
            }else if( getConsonantCode( prev ) == getConsonantCode(next) ){
                temp += prev;
                i=i+1;
            }
        }else if( getConsonantCode( curr ) != getConsonantCode(prev) ){
            if( dropList.indexOf( curr ) == -1 ){
                temp += curr;
            }
        }
    }
    temp = ( temp + "0000" ).substring( 0, 3 );
    for( int i = 0; i<=2 ; i++ ){
        soundexCode += getConsonantCode( temp.charAt(i) );
    }
    return soundexCode;
}
}

【讨论】:

  • 正如目前所写,您的答案尚不清楚。请edit 添加其他详细信息,以帮助其他人了解这如何解决所提出的问题。你可以找到更多关于如何写好答案的信息in the help center
【解决方案2】:
public class Soundex {
    public static String soundexOut(String word) {
        String drop = dropedWord(word);
        word = word.toLowerCase();
        String soundex = "" + drop.charAt(0);
        drop = drop.toLowerCase();
        int i;
        if (soundexCode(drop.charAt(0)) == soundexCode(drop.charAt(1)))
            i = 2;
        else
            i = 1;
        for (; i < drop.length() && soundex.length() < 5; i++) {
            if (i < drop.length()-1 && soundexCode(drop.charAt(i-1)) == soundexCode(drop.charAt(i+1)) ) {
                if (drop.charAt(i) == 'y' || drop.charAt(i) == 'h' || drop.charAt(i) == 'w')
                    i++;
            }
            else {
                int code = soundexCode(drop.charAt(i));
                soundex += code!=0 ? code : "";
            }
        }
        if (soundex.length() < 4)
            for (i = soundex.length(); i < 4; i++) {
                soundex += "0";
            }
        return soundex;
    }
    public static int soundexCode(char c) {
        String [] code = {"b, f, p, v" , "c, g, j, k, q, s, x, z" , "d, t" , "l" , "m,n" , "r"} ;
        int codeNumber = 0;
        for( int i = 0 ; i < code.length ; i++ ){
            if( code[i].indexOf(c) >= 0 ) {
            codeNumber = i+1;
            }
        }
        return codeNumber;

    }
    public static String dropedWord(String word) {
        String drop = "";
        drop += word.charAt(0);
        word = word.toLowerCase();
        for (int i = 1; i < word.length(); i++) {
            if (word.charAt(i) == 'a' || word.charAt(i) == 'e' || word.charAt(i) == 'i' ||
                word.charAt(i) == 'o' || word.charAt(i) == 'u' )
                    continue;
            drop += word.charAt(i);
        }
        return drop;
    }
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-02-12
    • 2013-04-29
    • 2019-09-02
    • 2011-07-11
    • 2013-07-20
    • 2012-04-01
    • 2020-03-12
    相关资源
    最近更新 更多