Poisson Distribution In R

Last Updated : 02 Feb, 2024

Poisson distribution is a probability distribution that expresses the number of events occurring in a fixed interval of time or space, given a constant average rate. This distribution is particularly useful when dealing with rare events or incidents that happen independently. R provides powerful tools for statistical analysis, making it an excellent choice for working with probability distributions like Poisson.

Poisson Distribution

Poisson distribution is a probability distribution that describes the number of events that occur within a fixed interval of time or space. If λ is the mean occurrence per interval, then the probability of having x occurrences within a given interval is:

$P(X = k) = \frac{e^{-\lambda} \cdot \lambda^k}{k!}$

P(X=k) represents the probability of observing k events.
e is the base of the natural logarithm.
λ is the average rate of event occurrences in a fixed interval.
k is the actual number of events observed.
k! denotes the factorial of k, which is the product of all positive integers up to k.

Uses Poisson distribution when

Events unfold randomly and autonomously, where the likelihood of one event occurring does not influence the likelihood of another.
The average rate of events within a specific timeframe or space, denoted as λ (lambda), is known and presumed to be consistent.
When events adhere to a Poisson distribution, λ serves as the singular parameter necessary for determining the probability of a particular number of events taking place.

The probability of having thirty or more inquiries, we subtract the probability calculated above from 1 . This is because the probability in the upper tail is complementary to the probability in the lower tail.

R

# Probability of having thirty or fewer inquiries
probability_30_or_less <- ppois(30, lambda = 20)
print(probability_30_or_less)
# Probability of having thirty or more inquiries
probability_30_or_more <- 1 - probability_30_or_less
print(probability_30_or_more)

Output:

[1] 0.9865253

[1] 0.01347468

The probability of having thirty or fewer inquiries (P(X≤30)) is approximately 98.65%
The probability of having thirty or more inquiries (P(X≥30)) is approximately 1.35%.
This means that, in a minute, there is a high likelihood (98.65%) that the number of customer inquiries will be thirty or fewer, and a low likelihood (1.35%) that it will be thirty or more, based on the given average rate.

Characteristics of Poisson distribution

Events Occur Independently: Poisson distribution assumes that events occur independently of each other. This means the occurrence of one event does not affect the occurrence of another.
Constant Average Rate: The events happen at a constant average rate over a fixed interval of time or space.
Discrete Nature: The distribution is discrete, meaning it deals with whole numbers (0, 1, 2, …) as it represents the count of events.

Poisson Functions in R Programming

In R, several built-in functions to work with the Poisson distribution. The key functions include `dpois()`, `ppois()`, `qpois()`, and `rpois()`, which correspond to the probability density function (PMF), cumulative distribution function (CDF), quantile function, and random number generation, respectively.

1. dpois(x, lambda)

This function calculates the probability mass function (PMF) of the Poisson distribution.
It gives the probability of observing exactly `x` events in a Poisson distribution with mean (`lambda`).

R

x <- 3
lambda <- 2
probability <- dpois(x, lambda)
print(probability)

Output:

[1] 0.180447

2. ppois(q, lambda)

This function calculates the cumulative distribution function (CDF) of the Poisson distribution.
It gives the probability of observing fewer than or equal to `q` events.

R

q <- 2
lambda <- 3
cumulative_probability <- ppois(q, lambda)
print(cumulative_probability)

Output:

[1] 0.4231901

3. qpois(p, lambda)

This function calculates the quantile function of the Poisson distribution.
It returns the smallest integer `q` such that `ppois(q, lambda)` is greater than or equal to `p`.

R

p <- 0.8
lambda <- 4
quantile_value <- qpois(p, lambda)
print(quantile_value)

Output:

[1] 6

4. rpois(n, lambda)

This function generates random samples from a Poisson distribution.
It produces `n` random values representing the count of events, where the mean is specified by `lambda`.

R

n <- 10
lambda <- 5
random_samples <- rpois(n, lambda)
print(random_samples)

Output:

[1] 8 3 9 6 4 4 2 4 2 8

These functions are part of the base R package and are helpful for performing various operations related to the Poisson distribution.

Types of Poisson Process

1. Homogeneous Poisson Process

Assumes events occur at a constant average rate over time or space.
Events are independent and discrete.

Online Customer Purchases

Suppose we want to model the number of customer purchases on an e-commerce website within a specific time frame (e.g., one hour). We assume that purchases occur independently of each other and at a constant average rate over time.

Let’s say, on average, there are 5 customer purchases per hour (ƛ = 5). Using a homogeneous Poisson process, we can estimate the probability of observing a specific number of purchases within that time frame. For instance:

Probability of 3 purchases in an hour: P(X = 3) = e^-5.5³ / 3!
Probability of at least 2 purchases in an hour: P(X⩾2) = 1 – P(X<2) = 1 – P(X = 0) – P(X = 1)

This example assumes a constant average rate of customer purchases over the specified time interval.

2. Non-Homogeneous Poisson Process

Allows for a varying rate of events over time or space.
The average rate (lambda) becomes a function of time or space.

Hospital Patient Arrivals

Consider a hospital emergency department where the number of patient arrivals varies throughout the day. In this scenario, the average rate of patient arrivals becomes a function of time. Let’s say the average rate of patient arrivals (ƛ(t)) is higher during peak hours and lower during off-peak hours.

During peak hours (9 AM – 5 PM), the average rate is given by ƛ(t) = 10 , and during off-peak hours (5 PM – 9 AM), the average rate is ƛ(t)=3
We can then model the number of patient arrivals using a non-homogeneous Poisson process. The probability of observing a certain number of arrivals within a specific time period would vary based on the time of day.
For example, the probability of exactly 5 patient arrivals between 2 PM and 3 PM could be calculated as P(X = 5) = e^-10.10⁵ /5!
This example shows how the rate of events can change dynamically over time, making it suitable for situations where the constant rate assumption does not hold.

Implementing Poisson Distribution in R

Suppose we want to model the number of user visits to a website over a day. We’ll assume that website visits follow a Poisson distribution with a non-homogeneous rate. The rate will vary based on different time intervals within the day.

R

# Set the seed for reproducibility
set.seed(123)
 
# Generate a Poisson-distributed dataset
lambda <- 5  # Average rate of events
poisson_data <- rpois(100, lambda)
 
# Create a bar plot to visualize the probability mass function
barplot(table(poisson_data)/length(poisson_data), 
        col = "skyblue", 
        main = "Poisson Distribution PMF",
        xlab = "Number of Events",
        ylab = "Probability",
        ylim = c(0, 0.15))
 
# Add a red line representing the theoretical Poisson PMF
points(0:max(poisson_data), dpois(0:max(poisson_data), lambda), type = "b", col = "red")
 
# Add legend
legend("topright", legend = c("Empirical PMF", "Theoretical PMF"), 
       fill = c("skyblue", "red"), 
       cex = 0.8)

Output:

Poisson Distribution In R

We generate a dataset of 100 observations from a Poisson distribution with a specified average rate.
The bar plot displays the empirical probability mass function (PMF) of the generated dataset in blue.
The red line represents the theoretical Poisson PMF for comparison.

This visualization helps us to compare the observed distribution with the theoretical Poisson distribution, providing a clear visual representation of how well the dataset aligns with the expected probabilities. Adjust the parameters like the seed, sample size, or average rate to explore different scenarios.

Use Cases of Poisson Distribution

Traffic Flow:- Modeling the number of cars passing through a toll booth in a given time period.
Call Centers:- Predicting the number of incoming calls during specific hours.
Insurance Claims:- Estimating the number of insurance claims within a certain timeframe.
Web Server Requests:- Analyzing the number of requests a server receives in a fixed time interval.
Epidemiology:- Studying the occurrence of diseases or rare events in a population.

Advantages of Poisson Distribution

Simplicity:- Simple and easy to understand, making it accessible for modeling various scenarios.
Versatility:- Applicable to a wide range of fields where rare events or occurrences are of interest.
Independence:- Assumes events occur independently, simplifying the modeling process.
Statistical Tools:- Well-supported in statistical software like R, facilitating analysis and interpretation.

Disadvantages of Poisson Distribution

Assumption of Independence:- Strict assumption of independence might not hold in certain real-world scenarios.
Constant Rate:- Assumes a constant average rate, which may not be realistic in all situations.
Limited Application to Continuous Data:- While suitable for discrete events, it may not be the best choice for continuous data.
Sensitivity to Outliers:- Sensitive to outliers, which can affect the accuracy of predictions.

Practical Applications of Poisson Distribution

Network Security:- Analyzing the number of security breaches or attacks on a network within a specific timeframe.
Inventory Management:- Estimating the number of items sold in a store during a particular hour.
Quality Control:- Assessing the number of defects in a manufacturing process.
Biology and Genetics:- Studying the distribution of mutations in a DNA sequence.
Finance:- Predicting the number of defaults in a loan portfolio.

Conclusion

Poisson distribution is a valuable tool in probability theory and statistics, finding applications in diverse fields due to its simplicity and versatility. While it has its limitations, understanding the assumptions, advantages, and disadvantages of Poisson distribution is crucial for its effective application in real-world scenarios. As technology and statistical methodologies evolve, the use of Poisson distribution remains relevant in modeling and predicting rare events.

Suggest improvement

Plot t Distribution in R

Share your thoughts in the comments

Poisson Distribution In R