带概率的随机数答案

【问题标题】：Random number with Probabilities带概率的随机数
【发布时间】：2013-12-18 03:23:14
【问题描述】：

我想知道在特定范围内生成随机数的最佳方法是什么（例如在 Java 中），其中每个数字都有一定的发生概率？

例如

从 [1;3] 中生成具有以下概率的随机整数：

P(1) = 0.2
P(2) = 0.3
P(3) = 0.5

现在我正在考虑在 [0;100] 内生成随机整数的方法，并执行以下操作：

如果它在 [0;20] 内 --> 我得到了我的随机数 1。
如果它在 [21;50] 内 --> 我得到了我的随机数 2。
如果它在 [51;100] 内 --> 我得到了我的随机数 3。

你会说什么？

【问题讨论】：

我认为这样做是一个聪明的方法，但我不知道是否有什么“更好”的。只要确保你从 0 到 99，否则你最终会得到 101 个数字，而不是你想要的百分比。
是的，这似乎是合理的，否则你可以使用EnumeratedIntegerDistribution，示例显示here
当然，我在SSJ 中找不到针对您的问题的相关实现，但您应该比我更彻底地查看它...

标签： java random probability

【解决方案1】：

您已经在问题中编写了实现。 ;)

final int ran = myRandom.nextInt(100);
if (ran > 50) { return 3; }
else if (ran > 20) { return 2; } 
else { return 1; }

对于更复杂的实现，您可以通过每次计算开关表上的结果来加快速度，如下所示：

t[0] = 1; t[1] = 1; // ... one for each possible result
return t[ran];

但只有当这是一个性能瓶颈并且每秒调用数百次时才应该使用它。

【讨论】：

您的回答对我帮助很大。非常感谢。

【解决方案2】：

你的方法已经很不错了，适用于任何范围。

只是想：另一种可能性是通过乘以一个常数乘数来去除分数，然后用这个乘数的大小构建一个数组。乘以 10 得到

P(1) = 2
P(2) = 3
P(3) = 5

然后您创建一个具有相反值的数组——“1”进入元素 1 和 2，“2”进入 3 到 6，依此类推：

P = (1,1, 2,2,2, 3,3,3,3,3);

然后你可以从这个数组中选择一个随机元素。

（添加。）使用 kiruwka 评论中示例中的概率：

int[] numsToGenerate           = new int[]    { 1,   2,    3,   4,    5   };
double[] discreteProbabilities = new double[] { 0.1, 0.25, 0.3, 0.25, 0.1 };

导致所有整数的最小乘数是 20，这给了你

2, 5, 6, 5, 2

因此numsToGenerate 的长度为 20，具有以下值：

1 1
2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4
5 5

分布完全相同相同：例如，“1”的机会现在是 20 分之 2 - 仍然是 0.1。

这是基于您的原始概率全部加起来为 1。如果不是，则将总数乘以相同的因子（这也将是您的数组长度）。

【讨论】：

非常感谢您对这个问题的回答 - 非常感谢您的帮助。

【解决方案3】：

如果你有性能问题而不是搜索所有 n 值 O(n)

你可以执行花费 O(log n) 的二分查找

Random r=new Random();      
double[] weights=new double[]{0.1,0.1+0.2,0.1+0.2+0.5};
// end of init
double random=r.nextDouble();
// next perform the binary search in weights array

如果你有很多权重元素，你只需要平均访问 log2(weights.length)。

【讨论】：

【解决方案4】：

前段时间我写了一个帮助类来解决这个问题。源代码应该足够清楚地显示概念：

public class DistributedRandomNumberGenerator {

    private Map<Integer, Double> distribution;
    private double distSum;

    public DistributedRandomNumberGenerator() {
        distribution = new HashMap<>();
    }

    public void addNumber(int value, double distribution) {
        if (this.distribution.get(value) != null) {
            distSum -= this.distribution.get(value);
        }
        this.distribution.put(value, distribution);
        distSum += distribution;
    }

    public int getDistributedRandomNumber() {
        double rand = Math.random();
        double ratio = 1.0f / distSum;
        double tempDist = 0;
        for (Integer i : distribution.keySet()) {
            tempDist += distribution.get(i);
            if (rand / ratio <= tempDist) {
                return i;
            }
        }
        return 0;
    }

}

类的用法如下：

DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.3d); // Adds the numerical value 1 with a probability of 0.3 (30%)
// [...] Add more values

int random = drng.getDistributedRandomNumber(); // Generate a random number

测试驱动程序以验证功能：

    public static void main(String[] args) {
        DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
        drng.addNumber(1, 0.2d);
        drng.addNumber(2, 0.3d);
        drng.addNumber(3, 0.5d);

        int testCount = 1000000;

        HashMap<Integer, Double> test = new HashMap<>();

        for (int i = 0; i < testCount; i++) {
            int random = drng.getDistributedRandomNumber();
            test.put(random, (test.get(random) == null) ? (1d / testCount) : test.get(random) + 1d / testCount);
        }

        System.out.println(test.toString());
    }

此测试驱动程序的示例输出：

{1=0.20019100000017953, 2=0.2999349999988933, 3=0.4998739999935438}

【讨论】：

我喜欢这样！如果你想大规模使用它，hashmap 应该使用Float 而不是Double 以减少不必要的开销
你能解释一下main()中的for循环吗？我不明白它在做什么。另外，为什么在计算之前不检查distSum 是否为1？
你在用这个做什么：if (this.distribution.get(value) != null) { distSum -= this.distribution.get(value); }？
@user366312 如果addNumber(int value, ...) 被同一个value 调用多次，则此行可确保总和distSum 保持正确的值。

【解决方案5】：

您的方法适用于您选择的特定数字，尽管您可以通过使用 10 的数组而不是 100 的数组来减少存储空间。但是，这种方法不能很好地推广到大量结果或具有概率的结果如1/e 或1/PI。

一个可能更好的解决方案是使用alias table。别名方法需要 O(n) 工作来为 n 结果设置表，但是无论有多少结果，生成的时间都是恒定的。

【讨论】：

非常感谢 :) 你帮了我很多。

【解决方案6】：

在参考了另一个post中pjs所指的论文后，写了这门课进行面试，base64表的人口可以进一步优化。结果出奇的快，初始化的成本略高，但如果概率不经常变化，这是一个好方法。

*对于重复键，取最后一个概率而不是合并（与 EnumeratedIntegerDistribution 行为略有不同）

public class RandomGen5 extends BaseRandomGen {

    private int[] t_array = new int[4];
    private int sumOfNumerator;
    private final static int DENOM = (int) Math.pow(2, 24);
    private static final int[] bitCount = new int[] {18, 12, 6, 0};
    private static final int[] cumPow64 = new int[] {
            (int) ( Math.pow( 64, 3 ) + Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 0 ) )
    };


    ArrayList[] base64Table = {new ArrayList<Integer>()
            , new ArrayList<Integer>()
            , new ArrayList<Integer>()
            , new ArrayList<Integer>()};

    public int nextNum() {
        int rand = (int) (randGen.nextFloat() * sumOfNumerator);

        for ( int x = 0 ; x < 4 ; x ++ ) {
                if (rand < t_array[x])
                    return x == 0 ? (int) base64Table[x].get(rand >> bitCount[x])
                            : (int) base64Table[x].get( ( rand - t_array[x-1] ) >> bitCount[x]) ;
        }
        return 0;
    }

    public void setIntProbList( int[] intList, float[] probList ) {
        Map<Integer, Float> map = normalizeMap( intList, probList );
        populateBase64Table( map );
    }

    private void clearBase64Table() {
        for ( int x = 0 ; x < 4 ; x++ ) {
            base64Table[x].clear();
        }
    }

    private void populateBase64Table( Map<Integer, Float> intProbMap ) {
        int startPow, decodedFreq, table_index;
        float rem;

        clearBase64Table();

        for ( Map.Entry<Integer, Float> numObj : intProbMap.entrySet() ) {
            rem = numObj.getValue();
            table_index = 3;
            for ( int x = 0 ; x < 4 ; x++ ) {
                decodedFreq = (int) (rem % 64);
                rem /= 64;
                for ( int y = 0 ; y < decodedFreq ; y ++ ) {
                    base64Table[table_index].add( numObj.getKey() );
                }
                table_index--;
            }
        }

        startPow = 3;
        for ( int x = 0 ; x < 4 ; x++ ) {
            t_array[x] = x == 0 ? (int) ( Math.pow( 64, startPow-- ) * base64Table[x].size() )
                    : ( (int) ( ( Math.pow( 64, startPow-- ) * base64Table[x].size() ) + t_array[x-1] ) );
        }

    }

    private Map<Integer, Float> normalizeMap( int[] intList, float[] probList ) {
        Map<Integer, Float> tmpMap = new HashMap<>();
        Float mappedFloat;
        int numerator;
        float normalizedProb, distSum = 0;

        //Remove duplicates, and calculate the sum of non-repeated keys
        for ( int x = 0 ; x < probList.length ; x++ ) {
            mappedFloat = tmpMap.get( intList[x] );
            if ( mappedFloat != null ) {
                distSum -= mappedFloat;
            } else {
                distSum += probList[x];
            }
            tmpMap.put( intList[x], probList[x] );
        }

        //Normalise the map to key -> corresponding numerator by multiplying with 2^24
        sumOfNumerator = 0;
        for ( Map.Entry<Integer, Float> intProb : tmpMap.entrySet() ) {
            normalizedProb = intProb.getValue() / distSum;
            numerator = (int) ( normalizedProb * DENOM );
            intProb.setValue( (float) numerator );
            sumOfNumerator += numerator;
        }

        return tmpMap;
    }
}

【讨论】：

【解决方案7】：

试试这个：在此示例中，我使用了一个字符数组，但您可以将其替换为整数数组。

权重列表包含每个字符的相关概率。它代表我的字符集的概率分布。

在每个字符的权重列表中，我存储了他的实际概率加上任何先行概率的总和。

例如在weightsum中，'C'对应的第三个元素是65：
P('A') + P('B) + P('C') = P(X=>c)
10 + 20 + 25 = 65

所以 weightsum 代表我的字符集的累积分布。 weightsum 包含以下值：

很容易看出，第8个元素对应H，有更大的差距（80当然像他的概率）那么更有可能发生！

        List<Character> charset =   Arrays.asList('A','B','C','D','E','F','G','H','I','J');
        List<Integer> weight = Arrays.asList(10,30,25,60,20,70,10,80,20,30);
        List<Integer>  weightsum = new ArrayList<>();

        int i=0,j=0,k=0;
        Random Rnd = new Random();

        weightsum.add(weight.get(0));

        for (i = 1; i < 10; i++)
            weightsum.add(weightsum.get(i-1) + weight.get(i));

然后我使用一个循环从 charset 中提取 30 个随机字符，每个字符都根据累积概率绘制。

在 k i 中存储了一个从 0 到 weightsum 中分配的最大值的随机数。然后我在 weightsum 中查找一个大于 k 的数字，该数字在 weightsum 中的位置对应于 charset 中 char 的相同位置。

   for (j = 0; j < 30; j++)
   {
   Random r = new Random();
   k =   r.nextInt(weightsum.get(weightsum.size()-1));

   for (i = 0; k > weightsum.get(i); i++) ;
   System.out.print(charset.get(i));
   }

代码给出了字符序列：

HHFAIIDFBDDDHFICJHACCDFJBGBHHB

让我们算一下吧！

A = 2
B = 4
C = 3
D = 5
E = 0
F = 4
G = 1
H = 6
我 = 3
J = 2

总计：30
正如我们希望的那样，D 和 H 的出现次数更多（70% 和 80% 概率。）
否则E根本没有出来。（10% 的可能性）

【讨论】：

【解决方案8】：

如果您不反对在代码中添加新库，则此功能已在 MockNeat 中实现，请检查 probabilities() 方法。

直接来自 wiki 的一些示例：

String s = mockNeat.probabilites(String.class)
                .add(0.1, "A") // 10% chance
                .add(0.2, "B") // 20% chance
                .add(0.5, "C") // 50% chance
                .add(0.2, "D") // 20% chance
                .val();

或者，如果您想以给定的概率在给定范围内生成数字，您可以执行以下操作：

Integer x = m.probabilites(Integer.class)
             .add(0.2, m.ints().range(0, 100))
             .add(0.5, m.ints().range(100, 200))
             .add(0.3, m.ints().range(200, 300))
             .val();

免责声明：我是该库的作者，所以我在推荐它时可能会有偏见。

【讨论】：

【解决方案9】：

这里是python代码，虽然你要求java，但是很相似。

# weighted probability

theta = np.array([0.1,0.25,0.6,0.05])
print(theta)

sample_axis = np.hstack((np.zeros(1), np.cumsum(theta))) 
print(sample_axis)

[0. 0.1 0.35 0.95 1。]。这代表累积分布。

您可以使用均匀分布在这个单位范围内绘制索引。

def binary_search(axis, q, s, e):
    if e-s <= 1:
        print(s)
        return s
    else: 
        m = int( np.around( (s+e)/2 ) )
        if q < axis[m]:
            binary_search(axis, q, s, m)
        else:
            binary_search(axis, q, m, e)



range_index = np.random.rand(1)
print(range_index)
q = range_index
s = 0
e = sample_axis.shape[0]-1
binary_search(sample_axis, q, 0, e)

【讨论】：

【解决方案10】：

也在此处回复：find random country but probability of picking higher population country should be higher。使用 TreeMap：

TreeMap<Integer, Integer> map = new TreeMap<>();
map.put(percent1, 1);
map.put(percent1 + percent2, 2);
// ...

int random = (new Random()).nextInt(100);
int result = map.ceilingEntry(random).getValue();

【讨论】：

【解决方案11】：

这可能对某人有用，我在 python 中做的一个简单的。你只需要改变写 p 和 r 的方式。例如，这个将 0 到 0.1 之间的随机值投影到 1e-20 到 1e-12。

import random

def generate_distributed_random():
    p = [1e-20, 1e-12, 1e-10, 1e-08, 1e-04, 1e-02, 1]
    r = [0, 0.1, 0.3, 0.5, 0.7, 0.9, 1]
    val = random.random()
    for i in range(1, len(r)):
        if val <= r[i] and val >= r[i - 1]:
            slope = (p[i] - p[i - 1])/(r[i] - r[i - 1])
            return p[i - 1] + (val - r[i - 1])*slope


print(generate_distributed_random())

【讨论】：

【解决方案12】：

有一种更有效的方法，而不是进入分数或创建大数组或硬编码范围到 100

在您的情况下，数组变为 int[]{2,3,5} sum = 10 只需将所有概率的总和运行随机数生成器就可以了结果 = New Random().nextInt(10)

从索引 0 开始遍历数组元素并计算 sum 并在 sum 大于该索引的返回元素时返回作为输出

即如果结果是 6，那么它将返回索引 2，它不是 5

无论有大数字或范围大小，此解决方案都会扩展

【讨论】：