How to Perform a Log Rank Test in R

Last Updated : 16 Apr, 2024

In a wide range of domains, statistical analysis is essential, particularly in biomedical research where it is critical to comprehend survival outcomes. A popular statistical technique for comparing the survival distributions of two or more groups is the log-rank test. We will demonstrate how to run a log-rank test in R Programming Language a potent statistical programming language.

What is the Log Rank Test?

The Log Rank Test is a non-parametric test that compares two or more groups’ survival distributions. It evaluates whether the groups’ survival times differ significantly from one another.

Concepts Related to the Topic

Survival Analysis: It consists of the analysis of time-to-event data, where “event” can refer to any interesting outcome, such as a disease recurrence or death.
Censored Data: Some observations in the survival analysis may not encounter the event by the study’s conclusion. These observations, which are referred to as suppressed ones, are essential for precise analysis.
Hazard Function: Represents the failure rate that occurs instantly at a specific moment in time. The hazard functions of several groups are compared using the log-rank test.

The following hypotheses are used in this test:

H0: There is no difference in survival between the two groups.
HA: There is a difference in survival between the two groups.

Step 1: Load Necessary Packages

To perform a log-rank test in R, we will use the survival package, which provides functions to perform survival analysis and the survminer package for visualization purposes. The packages can be installed using the command in the R console.

# Install and load necessary packages
install.packages("survival")
install.packages("survminer")

library(survival)
library(survminer)

Step 2: Prepare Data

Let’s first have our survival data ready. This usually consists of two vectors: one that indicates if the event of interest occurred or if the observation was censored, and the other that represents the time-to-event outcomes (e.g., survival time).

To perform a log-rank test in R, we need to create a survival object using the Surv() function. The Surv() function takes two arguments: the first argument is the survival time, and the second argument is the censoring status.

We can create a survival object for the rx variable using the following command:

# Example data
time <- c(5, 10, 15, 20, 25)   # Survival time
status <- c(1, 1, 0, 1, 0)      # Event status (1: event occurred, 0: censored)
group <- c(1, 1, 2, 2, 2)       # Group assignments

# Create Surv object
surv_object <- Surv(time, status)

Step 3: Perform the Log-Rank Test

Now, we can perform a log-rank test using the survdiff() function. The survdiff() function takes two arguments: the first argument is the survival object, and the second argument is the variable to be tested.

# Perform log-rank test
logrank_test <- survdiff(surv_object ~ group)

This function tests the null hypothesis that there is no difference in survival between the groups.

Step 4: Interpret the Results

Let’s interpret the results of the log-rank test. We can extract the test statistic and p-value from the output.

# View log-rank test results
logrank_test

The output provides the chi-squared statistic and corresponding p-value, indicating whether there is a significant difference in survival between the groups.

Step 5: Visualize Survival Curves (Optional)

Visualizing survival curves can offer further insights into the differences between the groups. We can use the ggsurvplot() function from the survminer package to generate Kaplan-Meier survival curves.

# Visualize survival curves
ggsurvplot(logrank_test, data = your_data_frame, risk.table = TRUE)

Replace your_data_frame with the name of your data frame containing the survival data.

Let’s consider an example dataset where we compare the survival times of two treatment groups. We’ll use simulated survival data to illustrate the process.

# Load necessary packages
install.packages("survival")
install.packages("survminer")

library(survival)
library(survminer)

# Example survival data
time <- c(10, 15, 20, 30, 40, 50, 60)  # Survival time
status <- c(1, 1, 0, 1, 1, 0, 1)        # Event status (1: event occurred, 0: censored)
group <- c(rep("A", 4), rep("B", 3))    # Treatment groups

# Create a data frame with the survival data
surv_data <- data.frame(time = time, status = status, group = group)

# Fit survival curves
fit <- survfit(Surv(time, status) ~ group, data = surv_data)
fit

Output:

Call: survfit(formula = Surv(time, status) ~ group, data = surv_data)

        n events median 0.95LCL 0.95UCL
group=A 4      3   22.5      10      NA
group=B 3      2   60.0      40      NA

This output summarizes a survival analysis comparing two groups (A and B). It includes the number of observations, events, estimated median survival time, and 95% confidence intervals for each group. For group A, there were 4 observations, 3 events, and an estimated median survival time of 22.5 units. For group B, there were 3 observations, 2 events, and an estimated median survival time of 60 units. Confidence intervals for the median survival times are provided, but upper confidence limits are not available for both groups.

Visualize the result

# Plot survival curves
ggsurvplot(fit, data = surv_data, risk.table = TRUE, 
           pval = TRUE, conf.int = TRUE, legend.title = "Group")

Output:

Log Rank Test in R

The ggsurvplot()-visualized survival curves also offer a graphical depiction of the survival distributions for every group.

Conclusion

The Log Rank Test is a vital tool in survival analysis, allowing researchers to compare the survival experiences of different groups. By following these steps, you can effectively perform this test in R and interpret the results to draw meaningful conclusions. This article provides a clear framework for conducting a Log Rank Test in R, from data preparation to result interpretation. Remember, the key to a successful analysis is understanding your data and ensuring it meets the assumptions of the test.

Suggest improvement

How to rank Python NumPy arrays with ties?

How to Debug data.frame Error in R

Share your thoughts in the comments