【问题标题】:Comparing anagrams using prime numbers使用素数比较字谜
【发布时间】:2012-10-24 07:48:52
【问题描述】:

存在试图查看两个唯一字符串是否是彼此的字谜的问题。我考虑的第一个解决方案是对两个字符串进行排序,看看它们是否相等。

我一直在考虑另一种解决方案,我想讨论一下是否可行。

我们的想法是为每个字符分配一个数值并将其汇总,这样一组唯一的字符将产生一个唯一的值。当我们测试字谜时,我们不介意“asdf”和“adsf”的校验和是否相同——事实上,我们要求它是这样的。但是字符串“aa”和“b”的校验和不应该相等。

我正在考虑将前 52 个素数分配给字母“a”到“z”,然后是“A”到“Z”(假设我们只有字母)。

如果 52 个素数的集合中的任何两个或多个素数之和可能导致集合中存在另一个素数,则上述方案将失效。

我的疑问是:-

  1. 是否有任何编号方案可以满足我的要求?
  2. 我不确定所涉及的数学;是否有可能证明/是否有任何证据表明前 52 个素数的集合中的两个或多个素数之和至少有一个值存在于同一集合中?

谢谢。

【问题讨论】:

    标签: algorithm math numbers primes


    【解决方案1】:

    使用乘法而不是加法。素数是“乘法唯一”,但不是“加法唯一”。

    【讨论】:

    • 小心,你的数字很快就会变得非常大......第26个素数大约是100,所以zzzzzzzzzz应该是100^10
    • 很好的答案,完全正确。你可以做的就是在每个字符串上使用一个缓冲区来避免数字变大的担忧。将一个字符串左侧的数字放入一个缓冲区,将另一个字符串右侧的数字放入另一个缓冲区。在 th 中将 n 数字输入到缓冲区中,以使 101^n(其中 ^ 表示指数)小于您正在使用的整数类型可表示的最大数字。
    • 我认为我们可以用像 10^9+7 这样的素数模乘以大数。这样可以避免缓冲区溢出。
    【解决方案2】:

    一种稍微笨拙的方法需要最长字符串 max_len 的长度(或任何特定字符的最大数量,以获得更好的性能)。鉴于此,您的哈希可能看起来像

    number_of_a*max_len^51 + number_of_b*max_len^50 + ... + number_of_Z*max_len^0
    

    如前所述,如果您更喜欢使用素数,乘法会更好。

    当然,你可以通过一个包含 52 个值的数组来实现相同的效果。

    【讨论】:

      【解决方案3】:

      您试图通过比较两个 n 位数字的相等性来比较两个排序的字符串是否相等。只要你的字符串足够长,有超过 2^n 个可能的排序字符串,你肯定会有两个不同的排序字符串产生相同的 n 位数。通过http://en.wikipedia.org/wiki/Birthday_problem,您可能会在此之前遇到问题,除非(与素数相乘)有一些定理说您不能有两个不同的字符串来自同一个数字。

      在某些情况下,您可以通过将此想法用作快速的第一次相等性检查来节省时间,这样您只需要比较已排序的字符串(如果它们的数字匹配)。

      【讨论】:

        【解决方案4】:

        不要使用素数 - 素数属性与除法有关,而不是和。 但是,这个想法很好,您可以使用位集,但您会遇到另一个问题 - 重复字母(素数也有同样的问题,1+1+1=3)。 所以,你可以使用一个整数集,一个字母频率为 1...26 的数组。

        【讨论】:

          【解决方案5】:
          def primes(n):
              array = [i for i in range(2,n+1)]
              p = 2
          
              while p <= n:
                  i = 2*p
                  while i <= n:
                      array[i-2] = 0
                      i += p
                  p += 1
          
              return [num for num in array if num > 0]
          
          def anagram(a):
              # initialize a list
              anagram_list = []
              for i in a: 
                  for j in a: 
                      if i != j and (sorted(str(i))==sorted(str(j))):
                          anagram_list.append(i)
              return anagram_list
          
          if __name__ == '__main__':
              print("The Prime Numbers are:\n",primes(1000),"\n")
              a = primes(1000)
              print("Prime Numbers between 0 to 100:")
              T100 = a[:25]
              print(T100,"\n")
              print("The Anagram elements from 0 to 100 are listed :", anagram(T100),"\n")
              print("Prime Numbers between 101 to 200:")
              T200 = a[25:46]
              print(T200,"\n")
              print("The Anagram elements from 101 to 200 are listed :",anagram(T200),"\n")
              print("Prime Numbers between 201 to 300:")
              T300 = a[46:62]
              print(T300,"\n")
              print("The Anagram elements from 201 to 300 are listed :",anagram(T300),"\n")
              print("Prime Numbers between 301 to 400:")
              T400 = a[62:78]
              print(T400,"\n")
              print("The Anagram elements from 301 to 400 are listed :",anagram(T400),"\n")
              print("Prime Numbers between 401 to 500:")
              T500 = a[78:95]
              print(T500,"\n")
              print("The Anagram elements from 401 to 500 are listed :",anagram(T500),"\n")
              print()
              print("Prime Numbers between 501 to 600:")
              T600 = a[95:109]
              print(T600,"\n")
              print("The Anagram elements from 501 to 600 are listed :",anagram(T600),"\n")
              print("Prime Numbers between 601 to 700:")
              T700 = a[109:125]
              print(T700,"\n")
              print("The Anagram elements from 601 to 700 are listed :",anagram(T700),"\n")
              print("Prime Numbers between 701 to 800:")
              T800 = a[125:139]
              print(T800,"\n")
              print("The Anagram elements from 701 to 800 are listed :",anagram(T800),"\n")
              print()
              print("Prime Numbers between 801 to 900:")
              T900 = a[139:154]
              print(T900,"\n")
              print("The Anagram elements from 801 to 900 are listed :",anagram(T900),"\n")
              print("Prime Numbers between 901 to 1000:")
              T1000 = a[154:168]
              print(T1000,"\n")
              print("The Anagram elements from 901 to 1000 are listed :",anagram(T1000),"\n")
          

          输出:

          The Prime Numbers are:
           [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997] 
          
          Prime Numbers between 0 to 100:
          [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] 
          
          The Anagram elements from 0 to 100 are listed : [13, 17, 31, 37, 71, 73, 79, 97] 
          
          Prime Numbers between 101 to 200:
          [101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199] 
          
          The Anagram elements from 101 to 200 are listed : [113, 131, 137, 139, 173, 179, 193, 197] 
          
          Prime Numbers between 201 to 300:
          [211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293] 
          
          The Anagram elements from 201 to 300 are listed : [239, 293] 
          
          Prime Numbers between 301 to 400:
          [307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397] 
          
          The Anagram elements from 301 to 400 are listed : [313, 331, 337, 373, 379, 397] 
          
          Prime Numbers between 401 to 500:
          [401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499] 
          
          The Anagram elements from 401 to 500 are listed : [419, 491] 
          
          
          Prime Numbers between 501 to 600:
          [503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599] 
          
          The Anagram elements from 501 to 600 are listed : [] 
          
          Prime Numbers between 601 to 700:
          [601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691] 
          
          The Anagram elements from 601 to 700 are listed : [613, 619, 631, 691] 
          
          Prime Numbers between 701 to 800:
          [701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797] 
          
          The Anagram elements from 701 to 800 are listed : [] 
          
          
          Prime Numbers between 801 to 900:
          [809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887] 
          
          The Anagram elements from 801 to 900 are listed : [] 
          
          Prime Numbers between 901 to 1000:
          [907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997] 
          
          The Anagram elements from 901 to 1000 are listed : [919, 991] 
          

          如果您是 Python 开发人员,您还可以按照自己的意愿重新构建它。 如果是 Python 新手,请学习 List 的概念。

          【讨论】:

          • 您错过了 OP 问题的重点。他们想通过用素数代替字符(例如 [37, 23, 67, 71, 11, 43] 和 [67, 23, 37, 11, 43] 来测试两个字符串(例如“listen”和“silent”)是否是字谜, 71]) 并比较两个列表上的数学运算的结果。您的代码正在搜索属于字谜的素数(例如 419、491)。
          【解决方案6】:

          到目前为止,我还没有理解尝试示例的复杂性,所以我写了一个简单的 Python 3 示例:

          from operator import mul
          from functools import reduce
          
          TO_PRIME = dict( \
              a=2, b=3, c=5, d=7, e=11, \
              f=13, g=17, h=19, i=23, j=29, \
              k=31, l=37, m=41, n=43, o=47, \
              p=53, q=59, r=61, s=67, t=71, \
              u=73, v=79, w=83, x=89, y=97, z=101 \
              )
          
          def anagram_product(string):
              return reduce(mul, (TO_PRIME[char.lower()] for char in string if char.isalpha()), 1)
          
          def anagram_check(string_1, string_2):
              return anagram_product(string_1) == anagram_product(string_2)
          
          # True examples
          
          print(repr('Astronomer'), repr('Moon starer'), anagram_check('Astronomer', 'Moon starer'))
          
          print(repr('The Morse code'), repr('Here come dots'), anagram_check('The Morse code', 'Here come dots'))
          
          # False examples (near misses)
          
          print(repr('considerate'), repr('cure is noted'), anagram_check('considerate', 'cure is noted'))
          
          print(repr('debit card'), repr('bed credit'), anagram_check('debit card', 'bed credit'))
          

          输出

          > python3 test.py
          'Astronomer' 'Moon starer' True
          'The Morse code' 'Here come dots' True
          'considerate' 'cure is noted' False
          'debit card' 'bed credit' False
          > 
          

          下一步是将这个从乘积变为总和。我想的一种方法是将字母映射到无理数而不是素数。这些无理数必须是一种不会通过任何加法变得有理的类型。这是一个粗略的例子:

          from math import pi
          
          ROUND = 4
          
          TO_IRRATIONAL = {letter: pi ** n for n, letter in enumerate('abcdefghijklmnopqrstuvwxyz')}
          
          def anagram_sum(string):
              return round(sum(TO_IRRATIONAL[char.lower()] for char in string if char.isalpha()), ROUND)
          
          def anagram_check(string_1, string_2):
              return anagram_sum(string_1) == anagram_sum(string_2)
          
          # True examples
          
          print(repr('Astronomer'), repr('Moon starer'), anagram_check('Astronomer', 'Moon starer'))
          
          print(repr('The Morse code'), repr('Here come dots'), anagram_check('The Morse code', 'Here come dots'))
          
          # False examples (near misses)
          
          print(repr('considerate'), repr('cure is noted'), anagram_check('considerate', 'cure is noted'))
          
          print(repr('debit card'), repr('bed credit'), anagram_check('debit card', 'bed credit'))
          

          输出

          > python3 test2.py
          'Astronomer' 'Moon starer' True
          'The Morse code' 'Here come dots' True
          'considerate' 'cure is noted' False
          'debit card' 'bed credit' False
          > 
          

          我并不是说这是一组最优无理数,只是对这个概念的粗略演示。 (注意我需要使用round() 来完成这项工作,它指向一个设计缺陷——无理数的有限表示。)

          【讨论】:

            【解决方案7】:

            这是一个使用素数方式的c#实现:

                using System;
                using System.Collections.Generic;
                using System.Linq;
                using System.Text;
            
            namespace Anag
            {
                class Program
                {
                    private static int[] primes100 = new int[]
                                                        {
                                                            3, 7, 11, 17, 23, 29, 37,
                                                            47, 59, 71, 89, 107, 131,
                                                            163, 197, 239, 293, 353,
                                                            431, 521, 631, 761, 919,
                                                            1103, 1327, 1597, 1931,
                                                            2333, 2801, 3371, 4049,
                                                            4861, 5839, 7013, 8419,
                                                            10103, 12143, 14591, 17519,
                                                            21023, 25229, 30293, 36353,
                                                            43627, 52361, 62851, 75431,
                                                            90523, 108631, 130363,
                                                            156437, 187751, 225307,
                                                            270371, 324449, 389357,
                                                            467237, 560689, 672827,
                                                            807403, 968897, 1162687,
                                                            1395263, 1674319, 2009191,
                                                            2411033, 2893249, 3471899,
                                                            4166287, 4999559, 5999471,
                                                            7199369
                                                        };
            
                    private static int[] getNPrimes(int _n)
                    {
                        int[] _primes;
            
                        if (_n <= 100)
                            _primes = primes100.Take(_n).ToArray();
                        else
                        {
                            _primes = new int[_n];
            
                            int number = 0;
                            int i = 2;
            
                            while (number < _n)
                            {
            
                                var isPrime = true;
                                for (int j = 2; j <= Math.Sqrt(i); j++)
                                {
                                    if (i % j == 0 && i != 2)
                                        isPrime = false;
                                }
                                if (isPrime)
                                {
                                    _primes[number] = i;
                                    number++;
                                }
                                i++;
                            }
            
                        }
            
                        return _primes;
                    }
            
                    private static bool anaStrStr(string needle, string haystack)
                    {
                        bool _output = false;
            
                        var needleDistinct = needle.ToCharArray().Distinct();
            
                        int[] arrayOfPrimes = getNPrimes(needleDistinct.Count());
            
                        Dictionary<char, int> primeByChar = new Dictionary<char, int>();
                        int i = 0;
                        int needlePrimeSignature = 1;
            
                        foreach (var c in needleDistinct)
                        {
                            if (!primeByChar.ContainsKey(c))
                            {
                                primeByChar.Add(c, arrayOfPrimes[i]);
            
                                i++;
                            }
                        }
            
                        foreach (var c in needle)
                        {
                            needlePrimeSignature *= primeByChar[c];
                        }
            
                        for (int j = 0; j <= (haystack.Length - needle.Length); j++)
                        {
                            var result = 1;
                            for (int k = j; k < needle.Length + j; k++)
                            {
                                var letter = haystack[k];
                                result *= primeByChar.ContainsKey(letter) ? primeByChar[haystack[k]] : 0;
                            }
            
                            _output = (result == needlePrimeSignature);
                            if (_output)
                                break;
                        }
            
                        return _output;
                    }
            
            
                    static void Main(string[] args)
                    {
                        Console.WriteLine("Enter needle");
                        var _needle = Console.ReadLine(); ;
                        Console.WriteLine("Enter haystack");
                        var _haystack = Console.ReadLine(); 
            
                        Console.WriteLine(anaStrStr(_needle, _haystack));
                        Console.ReadLine();
            
                    }
                }
                }
            

            【讨论】:

              猜你喜欢
              • 2019-08-22
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 2015-10-25
              • 2020-11-30
              • 2015-01-17
              • 2019-02-02
              • 2021-06-18
              相关资源
              最近更新 更多