Class HypergeometricDistribution

  • All Implemented Interfaces:
    DiscreteDistribution

    public final class HypergeometricDistribution
    extends Object
    Implementation of the hypergeometric distribution.

    The probability mass function of \( X \) is:

    \[ f(k; N, K, n) = \frac{\binom{K}{k} \binom{N - K}{n-k}}{\binom{N}{n}} \]

    for \( N \in \{0, 1, 2, \dots\} \) the population size, \( K \in \{0, 1, \dots, N\} \) the number of success states, \( n \in \{0, 1, \dots, N\} \) the number of samples, \( k \in \{\max(0, n+K-N), \dots, \min(n, K)\} \) the number of successes, and

    \[ \binom{a}{b} = \frac{a!}{b! \, (a-b)!} \]

    is the binomial coefficient.

    See Also:
    Hypergeometric distribution (Wikipedia), Hypergeometric distribution (MathWorld)
    • Method Detail

      • of

        public static HypergeometricDistribution of​(int populationSize,
                                                    int numberOfSuccesses,
                                                    int sampleSize)
        Creates a hypergeometric distribution.
        Parameters:
        populationSize - Population size.
        numberOfSuccesses - Number of successes in the population.
        sampleSize - Sample size.
        Returns:
        the distribution
        Throws:
        IllegalArgumentException - if numberOfSuccesses < 0, or populationSize <= 0 or numberOfSuccesses > populationSize, or sampleSize > populationSize.
      • getPopulationSize

        public int getPopulationSize()
        Gets the population size parameter of this distribution.
        Returns:
        the population size.
      • getNumberOfSuccesses

        public int getNumberOfSuccesses()
        Gets the number of successes parameter of this distribution.
        Returns:
        the number of successes.
      • getSampleSize

        public int getSampleSize()
        Gets the sample size parameter of this distribution.
        Returns:
        the sample size.
      • probability

        public double probability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the value of the probability mass function at x.
      • probability

        public double probability​(int x0,
                                  int x1)
        For a random variable X whose values are distributed according to this distribution, this method returns P(x0 < X <= x1). The default implementation uses the identity P(x0 < X <= x1) = P(X <= x1) - P(X <= x0)

        Special cases:

        • returns 0.0 if x0 == x1;
        • returns probability(x1) if x0 + 1 == x1;
        Specified by:
        probability in interface DiscreteDistribution
        Parameters:
        x0 - Lower bound (exclusive).
        x1 - Upper bound (inclusive).
        Returns:
        the probability that a random variable with this distribution takes a value between x0 and x1, excluding the lower and including the upper endpoint.
      • logProbability

        public double logProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the logarithm of the value of the probability mass function at x.
      • cumulativeProbability

        public double cumulativeProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other, words, this method represents the (cumulative) distribution function (CDF) for this distribution.
        Parameters:
        x - Point at which the CDF is evaluated.
        Returns:
        the probability that a random variable with this distribution takes a value less than or equal to x.
      • survivalProbability

        public double survivalProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X > x). In other words, this method represents the complementary cumulative distribution function.

        By default, this is defined as 1 - cumulativeProbability(x), but the specific implementation may be more accurate.

        Parameters:
        x - Point at which the survival function is evaluated.
        Returns:
        the probability that a random variable with this distribution takes a value greater than x.
      • inverseSurvivalProbability

        public int inverseSurvivalProbability​(double p)
        Computes the inverse survival probability function of this distribution. For a random variable X distributed according to this distribution, the returned value is:

        \[ x = \begin{cases} \inf \{ x \in \mathbb Z : P(X \gt x) \le p\} & \text{for } 0 \le p \lt 1 \\ \inf \{ x \in \mathbb Z : P(X \gt x) \lt 1 \} & \text{for } p = 1 \end{cases} \]

        If the result exceeds the range of the data type int, then Integer.MIN_VALUE or Integer.MAX_VALUE is returned. In this case the result of survivalProbability(x) called using the returned (1-p)-quantile may not compute the original p.

        By default, this is defined as inverseCumulativeProbability(1 - p), but the specific implementation may be more accurate.

        The default implementation returns:

        Specified by:
        inverseSurvivalProbability in interface DiscreteDistribution
        Parameters:
        p - Cumulative probability.
        Returns:
        the smallest (1-p)-quantile of this distribution (largest 0-quantile for p = 1).
      • getMean

        public double getMean()
        Gets the mean of this distribution.

        For population size \( N \), number of successes \( K \), and sample size \( n \), the mean is:

        \[ n \frac{K}{N} \]

        Returns:
        the mean.
      • getVariance

        public double getVariance()
        Gets the variance of this distribution.

        For population size \( N \), number of successes \( K \), and sample size \( n \), the variance is:

        \[ n \frac{K}{N} \frac{N-K}{N} \frac{N-n}{N-1} \]

        Returns:
        the variance.
      • getSupportLowerBound

        public int getSupportLowerBound()
        Gets the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} \). By convention, Integer.MIN_VALUE should be substituted for negative infinity.

        For population size \( N \), number of successes \( K \), and sample size \( n \), the lower bound of the support is \( \max \{ 0, n + K - N \} \).

        Returns:
        lower bound of the support
      • getSupportUpperBound

        public int getSupportUpperBound()
        Gets the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) = 1 \} \). By convention, Integer.MAX_VALUE should be substituted for positive infinity.

        For number of successes \( K \), and sample size \( n \), the upper bound of the support is \( \min \{ n, K \} \).

        Returns:
        upper bound of the support
      • createSampler

        public DiscreteDistribution.Sampler createSampler​(org.apache.commons.rng.UniformRandomProvider rng)
        Creates a sampler.
        Specified by:
        createSampler in interface DiscreteDistribution
        Parameters:
        rng - Generator of uniformly distributed numbers.
        Returns:
        a sampler that produces random numbers according this distribution.