Open In App

Poisson Distribution In R

Poisson distribution is a probability distribution that expresses the number of events occurring in a fixed interval of time or space, given a constant average rate. This distribution is particularly useful when dealing with rare events or incidents that happen independently. R provides powerful tools for statistical analysis, making it an excellent choice for working with probability distributions like Poisson.

Poisson Distribution

Poisson distribution is a probability distribution that describes the number of events that occur within a fixed interval of time or space. If λ is the mean occurrence per interval, then the probability of having x occurrences within a given interval is:



Uses Poisson distribution when

  1. Events unfold randomly and autonomously, where the likelihood of one event occurring does not influence the likelihood of another.
  2. The average rate of events within a specific timeframe or space, denoted as λ (lambda), is known and presumed to be consistent.
  3. When events adhere to a Poisson distribution, λ serves as the singular parameter necessary for determining the probability of a particular number of events taking place.

The probability of having thirty or more inquiries, we subtract the probability calculated above from 1 . This is because the probability in the upper tail is complementary to the probability in the lower tail.



# Probability of having thirty or fewer inquiries
probability_30_or_less <- ppois(30, lambda = 20)
print(probability_30_or_less)
# Probability of having thirty or more inquiries
probability_30_or_more <- 1 - probability_30_or_less
print(probability_30_or_more)

                    

Output:

[1] 0.9865253

[1] 0.01347468

Characteristics of Poisson distribution

  1. Events Occur Independently: Poisson distribution assumes that events occur independently of each other. This means the occurrence of one event does not affect the occurrence of another.
  2. Constant Average Rate: The events happen at a constant average rate over a fixed interval of time or space.
  3. Discrete Nature: The distribution is discrete, meaning it deals with whole numbers (0, 1, 2, …) as it represents the count of events.

Poisson Functions in R Programming

In R, several built-in functions to work with the Poisson distribution. The key functions include `dpois()`, `ppois()`, `qpois()`, and `rpois()`, which correspond to the probability density function (PMF), cumulative distribution function (CDF), quantile function, and random number generation, respectively.

1. dpois(x, lambda)

x <- 3
lambda <- 2
probability <- dpois(x, lambda)
print(probability)

                    

Output:

[1] 0.180447

2. ppois(q, lambda)

q <- 2
lambda <- 3
cumulative_probability <- ppois(q, lambda)
print(cumulative_probability)

                    

Output:

[1] 0.4231901

3. qpois(p, lambda)

p <- 0.8
lambda <- 4
quantile_value <- qpois(p, lambda)
print(quantile_value)

                    

Output:

[1] 6

4. rpois(n, lambda)

n <- 10
lambda <- 5
random_samples <- rpois(n, lambda)
print(random_samples)

                    

Output:

[1] 8 3 9 6 4 4 2 4 2 8

These functions are part of the base R package and are helpful for performing various operations related to the Poisson distribution.

Types of Poisson Process

1. Homogeneous Poisson Process

Online Customer Purchases

Suppose we want to model the number of customer purchases on an e-commerce website within a specific time frame (e.g., one hour). We assume that purchases occur independently of each other and at a constant average rate over time.

Let’s say, on average, there are 5 customer purchases per hour (ƛ = 5). Using a homogeneous Poisson process, we can estimate the probability of observing a specific number of purchases within that time frame. For instance:

This example assumes a constant average rate of customer purchases over the specified time interval.

2. Non-Homogeneous Poisson Process

Hospital Patient Arrivals

Consider a hospital emergency department where the number of patient arrivals varies throughout the day. In this scenario, the average rate of patient arrivals becomes a function of time. Let’s say the average rate of patient arrivals (ƛ(t)) is higher during peak hours and lower during off-peak hours.

Implementing Poisson Distribution in R

Suppose we want to model the number of user visits to a website over a day. We’ll assume that website visits follow a Poisson distribution with a non-homogeneous rate. The rate will vary based on different time intervals within the day.

# Set the seed for reproducibility
set.seed(123)
 
# Generate a Poisson-distributed dataset
lambda <- 5  # Average rate of events
poisson_data <- rpois(100, lambda)
 
# Create a bar plot to visualize the probability mass function
barplot(table(poisson_data)/length(poisson_data),
        col = "skyblue",
        main = "Poisson Distribution PMF",
        xlab = "Number of Events",
        ylab = "Probability",
        ylim = c(0, 0.15))
 
# Add a red line representing the theoretical Poisson PMF
points(0:max(poisson_data), dpois(0:max(poisson_data), lambda), type = "b", col = "red")
 
# Add legend
legend("topright", legend = c("Empirical PMF", "Theoretical PMF"),
       fill = c("skyblue", "red"),
       cex = 0.8)

                    

Output:

Poisson Distribution In R

This visualization helps us to compare the observed distribution with the theoretical Poisson distribution, providing a clear visual representation of how well the dataset aligns with the expected probabilities. Adjust the parameters like the seed, sample size, or average rate to explore different scenarios.

Use Cases of Poisson Distribution

  1. Traffic Flow:- Modeling the number of cars passing through a toll booth in a given time period.
  2. Call Centers:- Predicting the number of incoming calls during specific hours.
  3. Insurance Claims:- Estimating the number of insurance claims within a certain timeframe.
  4. Web Server Requests:- Analyzing the number of requests a server receives in a fixed time interval.
  5. Epidemiology:- Studying the occurrence of diseases or rare events in a population.

Advantages of Poisson Distribution

  1. Simplicity:- Simple and easy to understand, making it accessible for modeling various scenarios.
  2. Versatility:- Applicable to a wide range of fields where rare events or occurrences are of interest.
  3. Independence:- Assumes events occur independently, simplifying the modeling process.
  4. Statistical Tools:- Well-supported in statistical software like R, facilitating analysis and interpretation.

Disadvantages of Poisson Distribution

  1. Assumption of Independence:- Strict assumption of independence might not hold in certain real-world scenarios.
  2. Constant Rate:- Assumes a constant average rate, which may not be realistic in all situations.
  3. Limited Application to Continuous Data:- While suitable for discrete events, it may not be the best choice for continuous data.
  4. Sensitivity to Outliers:- Sensitive to outliers, which can affect the accuracy of predictions.

Practical Applications of Poisson Distribution

  1. Network Security:- Analyzing the number of security breaches or attacks on a network within a specific timeframe.
  2. Inventory Management:- Estimating the number of items sold in a store during a particular hour.
  3. Quality Control:- Assessing the number of defects in a manufacturing process.
  4. Biology and Genetics:- Studying the distribution of mutations in a DNA sequence.
  5. Finance:- Predicting the number of defaults in a loan portfolio.

Conclusion

Poisson distribution is a valuable tool in probability theory and statistics, finding applications in diverse fields due to its simplicity and versatility. While it has its limitations, understanding the assumptions, advantages, and disadvantages of Poisson distribution is crucial for its effective application in real-world scenarios. As technology and statistical methodologies evolve, the use of Poisson distribution remains relevant in modeling and predicting rare events.


Article Tags :