theaicompendium.com

Generating Random Numbers in R


When working on machine learning projects, simulations, or modeling, generating random numbers is often a crucial step in your code. R provides several functions dedicated to random number generation. In this tutorial, you will explore these functions and learn how to apply them in broader programming contexts. Specifically, you will learn:

Let’s get started!

Overview

This tutorial is divided into three essential sections:

  1. Random Number Generators
  2. Generating Correlated Multivariate Gaussian Random Numbers
  3. Generating Random Numbers from a Uniform Distribution

Random Number Generators

Random numbers are derived from probability distributions, the most familiar being the Gaussian distribution, also known as the normal distribution.

The standard normal distribution is defined by its probability density function, which has a range spanning the entire real number line, from negative to positive infinity. The cumulative distribution function (CDF) quantifies the probability that a random variable takes on a value less than or equal to a specific point in the standard normal distribution.

In R, you have several functions at your disposal linked to the standard normal distribution:

The most commonly used function is rnorm(k), which returns a vector of k random values drawn from the standard normal distribution. To visualize the distribution of generated samples, you can plot a histogram:

hist(rnorm(10000), breaks=30, freq=FALSE)

This will yield a bell-shaped histogram centered at zero, with most samples falling between -1 and +1 (68% of them, to be precise). The freq=FALSE parameter means the y-axis displays density rather than frequency, aligning the histogram with the density function.

Generating Correlated Multivariate Gaussian Random Numbers

A frequent application of standard normal random numbers is generating pairs of correlated Gaussian random numbers. In this scenario, you want to create bivariate Gaussian random numbers that exhibit a non-zero correlation.

The process for obtaining such correlated random numbers is as follows:

  1. Generate a set of independent standard normal random numbers.
  2. Establish a covariance matrix that defines the relationships between the random variables.
  3. Perform the Cholesky decomposition of the covariance matrix.
  4. Multiply the matrix of independent random numbers by the Cholesky decomposition matrix.
  5. Adjust the mean as necessary.

Here is the R code to accomplish this:

# Bivariate Gaussian parameters
n_fea <- 2          # Number of random features
n_obs <- 1000       # Number of observations
means <- c(0, 1)    # Mean values for the random variables
vars <- c(1, 1)     # Variances for the random variables
corr <- matrix(     # Correlation matrix
   c(1.0, 0.6,
     0.6, 1.0),
   byrow = TRUE, nrow = 2
)

sd.diag <- diag(sqrt(vars))
cov <- sd.diag %*% corr %*% sd.diag  # Covariance matrix
cholesky <- chol(cov)                # Cholesky decomposition

obs <- matrix(rnorm(n_fea * n_obs), nrow=n_obs)  # Generate i.i.d. Gaussian random values
samples <- (obs %*% cholesky) + rep(means, each=nrow(obs))
print(samples)

The code above generates correlated random numbers based on a specified covariance matrix. To validate the results, you can compute and display the empirical correlation, mean, and standard deviation:

# Verify the results
print(cor(samples))  # Check the correlation matrix
print(colMeans(samples))  # Check the mean
print(apply(samples, 2, sd))  # Check the standard deviations

Generating Random Numbers from Uniform Distribution

While Gaussian distribution is prevalent, you may also need other distributions. For instance, the exponential distribution is useful for simulating events like phone call arrivals.

The exponential distribution has a specific density function defined over positive values. The inverse of this distribution function allows generating random numbers using uniform distribution via inverse transform sampling.

Here’s how to implement it in R:

lambda <- 2.5   # Arrival rate parameter
F.inv <- function(x) {
    return(-log(1 - x) / lambda)
}
n <- 1000  # Number of samples

x <- runif(n)  # Generate uniform random numbers between 0 and 1
x <- F.inv(x)  # Transform to exponential distribution
print(x)

In this code, runif(n) generates n samples from a uniform distribution, which are then transformed to create values following the exponential distribution. You can visualize the histogram of the generated samples to confirm the distribution:

hist(x, breaks=30, freq=FALSE)

Conclusion

In this tutorial, you learned how to generate random numbers in R. Specifically, you covered:

Further Reading

For more in-depth exploration of the topics discussed, consider the following resources:

Websites:

Books:

Feel free to reach out if you have any questions, need further modifications, or require additional information!

Exit mobile version