使用 gmp 有效地分解大数答案

【问题标题】：Factor a large number efficiently with gmp使用 gmp 有效地分解大数
【发布时间】：2023-03-08 11:01:02
【问题描述】：

我需要获取可以轻松达到 1k 位的所有大数的质因数。这些数字实际上是随机的，所以应该不难。我如何有效地做到这一点？我使用 C++ 和 GMP 库。

编辑：我想你们都误解了我的意思。
我所说的素数是指得到该数字的所有素因数。
对不起我的英语，在我的语言中，prime 和 factor 是相同的:)

澄清（来自 OP 的其他帖子）：

我需要的是一种使用 C++ 和 GMP(Gnu Multiple Precession lib) 或更少优选任何其他方式来有效分解（找到一个数字的素数）大数（可能达到 2048 位）的方法。这些数字实际上是随机的，因此很难考虑的可能性很小，即使数字很难考虑，我也可以重新滚动数字（虽然不能选择）。

【问题讨论】：

在这种情况下，素数是什么意思？您是否正在尝试生成大素数？还是您的意思是提前为某些特定用途准备数字？
你按下数字侧面的小软按钮，将其提高几个档位，这样它就可以启动了，引擎开始平稳运行。
我同意，这方面的英文太糟糕了，我可以猜到 OP 的意思，但肯定不清楚。
具有两个质因数的复合数的困难情况的概率不可忽略恕我直言（我的快速估计在 100000 个 1024 位的随机数中大约有 1 个）。
@starblue：有趣！你能更详细地解释一下你的数学吗？

标签： c++ math primes gmp factorization

【解决方案1】：

一个好的开始是对小素数进行一些预过滤，比如所有低于 100 000 左右的素数。只需尝试除以它们中的每一个（创建一个表，然后在运行时加载该表或将其作为代码中的静态数据）。它可能看起来很慢而且很愚蠢，但是如果这个数字是完全随机的，那么这会给你一些非常快的因素，而且概率很大。然后查看剩余的数字并决定下一步该做什么。如果它非常小（“小”的含义取决于您），您可以尝试质数测试（我认为 GMP 中有一些东西），如果它给出的是质数，那么在大多数情况下您可以信任它。否则，您必须进一步考虑它。

如果您的数字确实很大并且您关心性能，那么您肯定需要实现一些更复杂的东西，而不仅仅是一个愚蠢的除法。查看二次筛（尝试*）。它非常简单但非常强大。如果您愿意接受挑战，请尝试 MPQS，它是二次筛算法的一种变体。 This forum 是一个很好的信息来源。甚至还有您需要的工具的现有实现 - see for example this。

请注意，尽管 1k 位的数字无论如何都是巨大的。如果幸运的话，考虑这样一个数字（即使使用 MPQS 或其他数字）可能需要数年时间，否则可能需要永远。我认为 MPQS 在大约 100-400 位的数字上表现良好（如果它们由两个几乎同样大的素数组成，这当然是最难的情况）。

【讨论】：

对于超过 100 位数字的数字，二次筛开始被通用数字字段筛 (GNFS) 击败。
@Chris Card - 100 位数字，以哪个为基数？请说明。
当然是十进制的。 Chris 适合如此大的数字，QS（或者更清楚地说，MPQS，因为 QS 在这样的数字下要慢得多）开始变得比 GNFS 慢。
当前的记录因式分解是 RSA768 (crypto-world.com/FactorRecords.html)，需要大量工作。在标准的现代 PC 上使用 GNFS 可以分解大约 150 位的数字，但目前即使是*研究人员也无法达到 1k 位。
是的，任何试图分解大量数字的尝试大部分时间都是在分布式计算机系统上完成的，其中每个节点对结果的贡献很小。即便如此，完成这项任务也需要几个月甚至几年的时间。

【解决方案2】：

以下是 Java 中的示例算法（它不是带有 GMP 的 C++，但转换应该非常简单）：

生成位长为Nbits的随机数x
尝试分解所有小于 100 的素因数，保留除以 x 的素因数列表。
使用 Java 的 isProbablePrime 方法测试剩余因子是否为素数
如果剩余因子乘积是具有足够概率的素数，则我们已成功分解 x。（停止）
否则剩余因子乘积肯定是复合的（请参阅 isProbablePrime 文档）。
虽然我们还有时间，但我们会运行 Pollard rho algorithm，直到找到除数 d。
如果时间用完了，我们就失败了。（停止）
我们找到了一个除数 d。所以我们分解出 d，将 d 的质因数添加到 x 的质因数列表中，然后转到第 4 步。

该算法的所有参数都在程序列表的开头附近。我寻找 1024 位随机数，超时时间为 250 毫秒，然后我继续运行程序，直到我得到一个具有至少 4 个质因数的数字 x（有时程序会找到一个具有 1、2 或 3 个质因数的数字第一的）。使用此参数设置，在我的 2.66Ghz iMac 上通常需要大约 15-20 秒。

Pollard 的 rho 算法并不是那么高效，但与 quadratic sieve (QS) 或 general number field sieve (GNFS) 相比，它很简单——我只是想看看这个简单算法是如何工作的。

为什么会这样：（尽管你们中的许多人声称这是一个难题）

事实是，prime numbers aren't that rare。对于 1024 位数字，素数定理表示每 1024 ln 2 中大约有 1 个（= 大约 710）数字是素数。

因此，如果我生成一个素数随机数 x，并且我接受概率素数检测，我就成功分解了 x。

如果它不是素数，但我很快分解出几个小因数，而剩下的因数是（概率）素数，那么我已经成功分解了 x。

否则我只是放弃并生成一个新的随机数。（OP 说可以接受）

大多数成功因式分解的数将有 1 个大素因数和几个小素因数。

难以分解的数字是那些具有不小的质因数和至少 2 个大质因数的数字（其中包括作为两个大数乘积的密码密钥；OP 对密码学只字未提），并且我可以在时间用完时跳过它们。

package com.example;

import java.math.BigInteger;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class FindLargeRandomComposite {
    final static private int[] smallPrimes = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 
        31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 
        73, 79, 83, 89, 97};

    final static private int maxTime = 250;
    final static private int Nbits = 1024;
    final static private int minFactors = 4;
    final static private int NCERTAINTY = 4096;

    private interface Predicate { public boolean isTrue(); }

    static public void main(String[] args)
    {
        Random r = new Random();
        boolean found = false;
        BigInteger x=null;
        List<BigInteger> factors=null;
        long startTime = System.currentTimeMillis();
        while (!found)
        {
            x = new BigInteger(Nbits, r);
            factors = new ArrayList<BigInteger>();
            Predicate keepRunning = new Predicate() {
                final private long stopTime = System.currentTimeMillis() + maxTime;
                public boolean isTrue() {
                    return System.currentTimeMillis() < stopTime;
                }
            };
            found = factor(x, factors, keepRunning);
            System.out.println((found?(factors.size()+" factors "):"not factored ")+x+"= product: "+factors);
            if (factors.size() < minFactors)
                found = false;
        }
        long stopTime = System.currentTimeMillis();
        System.out.println("Product verification: "+(x.equals(product(factors))?"passed":"failed"));
        System.out.println("elapsed time: "+(stopTime-startTime)+" msec");
    }

    private static BigInteger product(List<BigInteger> factors) {
        BigInteger result = BigInteger.ONE;
        for (BigInteger f : factors)
            result = result.multiply(f);
        return result;
    }

    private static BigInteger findFactor(BigInteger x, List<BigInteger> factors,
            BigInteger divisor)
    {
        BigInteger[] qr = x.divideAndRemainder(divisor);
        if (qr[1].equals(BigInteger.ZERO))
        {
            factors.add(divisor);
            return qr[0];
        }
        else
            return x;
    }

    private static BigInteger findRepeatedFactor(BigInteger x,
            List<BigInteger> factors, BigInteger p) {
        BigInteger xprev = null;
        while (xprev != x)
        {
            xprev = x;
            x = findFactor(x, factors, p);
        }
        return x;
    }

    private static BigInteger f(BigInteger x, BigInteger n)
    {
        return x.multiply(x).add(BigInteger.ONE).mod(n);
    }

    private static BigInteger gcd(BigInteger a, BigInteger b) {
        while (!b.equals(BigInteger.ZERO))
        {
            BigInteger nextb = a.mod(b);
            a = b;
            b = nextb;
        }
        return a;
    }
    private static BigInteger tryPollardRho(BigInteger n,
            List<BigInteger> factors, Predicate keepRunning) {
        BigInteger x = new BigInteger("2");
        BigInteger y = x;
        BigInteger d = BigInteger.ONE;
        while (d.equals(BigInteger.ONE) && keepRunning.isTrue())
        {
            x = f(x,n);
            y = f(f(y,n),n);
            d = gcd(x.subtract(y).abs(), n);
        }
        if (d.equals(n))
            return x;
        BigInteger[] qr = n.divideAndRemainder(d);
        if (!qr[1].equals(BigInteger.ZERO))
            throw new IllegalStateException("Huh?");
        // d is a factor of x. But it may not be prime, so run it through the factoring algorithm.
        factor(d, factors, keepRunning);
        return qr[0];
    }

    private static boolean factor(BigInteger x0, List<BigInteger> factors,
            Predicate keepRunning) {

        BigInteger x = x0;
        for (int p0 : smallPrimes)
        {
            BigInteger p = new BigInteger(Integer.toString(p0));
            x = findRepeatedFactor(x, factors, p);          
        }
        boolean done = false;
        while (!done && keepRunning.isTrue())
        {
            done = x.equals(BigInteger.ONE) || x.isProbablePrime(NCERTAINTY);
            if (!done)
            {
                x = tryPollardRho(x, factors, keepRunning);
            }
        }
        if (!x.equals(BigInteger.ONE))
            factors.add(x);
        return done;
    }
}

【讨论】：

【解决方案3】：

目前您无法使用 GMP 考虑 bigint。您可以将您的 bigint 转换为其他库并使用它们的因式分解算法。请注意，对具有 >>20 位数字的整数进行因式分解需要专门的算法，并且速度几乎呈指数级增长。

退房：

【讨论】：

【解决方案4】：

如果要分解的数字具有较小的素因子，则可以使用 Pollard p-1 分解算法。它已分解出数字 2 ^ 740 + 1 的 30 位素数因子。ECM 是一种类似但次指数的算法，但实现起来更加困难。算法的时间量基于边界 b 的设置。它将分解任何具有因子 p 的数字，其中 p - 1 是 b 平滑的。

//Pollard p - 1 factorization algorithm

void factor(mpz_t g, mpz_t n, long b)
{
    //sieve for primes
    std::vector<bool> r;

    for(int i = 0; i < b; i++)
        r.push_back(true);


    for(int i = 2; i < ceil(sqrt(b - 1)); i++)
        if(r.at(i) == true)
            for(int j = i * i; j < b; j += i)
                r.at(j) = false;

    std::vector<long> p;
    std::vector<long> a;
    for(int i = 2; i < b; i++)
        if(r[i] == true)
        {
            p.push_back(i);//Append the prime on to the vector
            int temp = floor(log(b) / log(i)); //temp = logb(i)

            // put primes in to sieve
            // a = the maximum power for p ^ a < bound b
            if(temp == 0)
                a.push_back(1);
            else
                a.push_back(temp);                
        }

    int m = p.size();//m = number of primes under bound b

    mpz_t c;// c is the number Which will be exponated
    mpz_init(c);
    long two = 2;
    mpz_set_ui(c, two);// set c to 2

    int z = 0;
    long x = 2;

    // loop c until a factor is found
    for(;;)
    {
    mpz_set_si( c, x);

    //powering ladder
    for(long i = 0; i < m; i++)
        for(long j = 0; j < a[i]; j++)
            mpz_powm_ui(c , c, (p[i]), n);

    //check if a factor has been found;
    mpz_sub_ui(c ,c,1);
    mpz_gcd(g ,c, n);
    mpz_add_ui(c , c, 1);

    //if g is a factor return else increment c
    if((mpz_cmp_si(g,1)) > 0 && (mpz_cmp(g,n)) < 0)
        return;
    else if (x > b)
        break;
    else
        x++;
    }

}


int main()
{
    mpz_t x;
    mpz_t g;

    //intialize g and x
    mpz_init(g);
    mpz_init_set_str(x,"167698757698757868925234234253423534235342655234234235342353423546435347",10);

    //p-1 will factor x as long as it has a factor p where p - 1 is b-smooth(has all prime factors less than bound b)
    factor(g , x, 1000);

    //output the factor, it will output 1 if algorithm fails
    mpz_out_str(NULL, 10, g);

    return 0;
}

输出 - 7465647 执行时间 - 0.003 秒

J.Pollard 创建的另一种因式分解算法是 Pollards Rho 算法，它不是那么快，但需要的空间很小。他们也是parrelize它的方法。它的复杂度是O(n^1/4)

//Pollard rho factoring algorithm
void rho(mpz_t g, mpz_t n)
{
    mpz_t x;
    mpz_t y;
    mpz_init_set_ui(x ,2);
    mpz_init_set_ui(y ,2);//initialize x and y as 2
    mpz_set_ui(g , 1);
    mpz_t temp;
    mpz_init(temp);

    if(mpz_probab_prime_p(n,25) != 0)
        return;//test if n is prime with miller rabin test

    int count;
    int t1 = 0;
    int t2 = 1;
    int nextTerm = t1 + t2;
    while(mpz_cmp_ui(g,1) < 1)
    {
        f(x,n);//x is changed
        f(y,n);//y is going through the sequence twice as fast
        f(y,n);

        if(count == nextTerm)//calculate gcd every fibonacci number
        {
            mpz_sub(temp,x,y);
            mpz_gcd(g , temp, n);

            t1 = t2;
            t2 = nextTerm;
            nextTerm = t1 + t2;//calculate next fibonacci number
        }

        count ++;
    }

    return;
}

int main()
{
    mpz_t x;
    mpz_t g;

    //intialize g and x
    mpz_init(g);
    mpz_init_set_str(x,"167698757698757868925234234253423",10);


    rho(g , x);

    //output the factor, it will output 1 if algorithm fails
    mpz_out_str(NULL, 10, g);

    return 0;
}

输出 - 353 执行时间 - 0.003s

【讨论】：