Open In App

How to Converting a R code into C++ for Rcpp implementation

When dealing with performance issues in R code, there may be situations where R alone is not sufficiently fast. To rescue there is a powerful package in R Programming Language called Rcpp that allows for seamless integration of C++ code into R, providing significant performance improvements. Converting R code into C++ using Rcpp can enhance computational efficiency, especially for computationally intensive tasks. This guide will walk you through the process of converting R code into C++ using Rcpp.

What is Rcpp?

Rcpp is an R package that provides a simple and efficient way to write high-performance R functions in C++. Rcpp allows for direct access to R data structures and functions, making it easier to bridge the gap between R and C++. Rcpp allows for direct access to R data structures and functions, making it easier to bridge the gap between R and C++.



Why use Rcpp?

Improved performance

C++ is a lower-level language that can execute computations more efficiently than R, thus C++ code executed via Rcpp runs significantly faster than the equivalent R code.

Seamless integration and Developer Friendly API

Rcpp provides a simple syntax that enables developers to write C++ code within R scripts without needing to switch between languages, for this Rcpp provides C++ classes that align with R’s object-oriented programming style. These classes, such as NumericVector, CharacterVector, List, DataFrame, etc., enable developers to work with R data structures in a natural and efficient manner within the same R script.



Installation




install.packages("Rcpp")

To use it you will need a C++ compiler

Implementation

Rcpp can be used in two ways

Using Inline C++

Example 1

Adding two vectors using Rcpp

The classes for the most common types of R vectors based on their data type are:




# R code to add two vectors
sumTwoR <- function(x, y) {
    total <- 0
    for(i in 1:length(x)) {
        total <- total + x[i] + y[i]
    }
    return(total)
}
 
x <- c(1, 2, 3)
y <- c(4, 5, 6)
 
# Calling sumTwoR
sumTwoR(x,y)

Output:

[1] 21

First, we created a function called sumTwoR which will add two vectors x and y using for loop.

Then we are calling this function as sumTwoR(x,y) providing x {1,2,3} and y {4,5,6} as arguments.

Converted to Rcpp

We are creating a C++ equivalent sumTwoC(x,y) of the above function in R script, to do this we need a function called cppFunction() from the Rcpp package.




library(Rcpp)
 
# Code in C++ to calculate the sum of vector values
cppFunction('
  double sumTwoC(NumericVector x, NumericVector y) {
    int n = x.size();
    double total = 0;
    for(int i = 0; i < n; ++i) {
      total += x[i] + y[i];
    }
    return total;
  }')
 
 
# Similarly calling sumTwoC
sumTwoC(c(1, 2, 3), c(4, 5, 6))

Output

[1] 21

First, we called the package Rcpp using, library(Rcpp), now explaining the function sumTwoC

We will call the function sumTwoC in the same way we were calling the function sumTwoR. We have just changed the function definition of the R code everything else is the same, this is the seamless integration I was talking about before.

Example 2: Matrix input

The classes for the most common types of R matrices based on their data type are:




multipleMatricesR <- function(A, B) {
  return(A %*% B)
}
 
# Define a 3x3 matrix
A <- matrix(1:9, nrow = 3, ncol = 3)
B <- matrix(1:9, nrow = 3, ncol = 3)
 
# Multiply the matrices using the R function
multipleMatricesR(A, B)

Output:

       [,1]    [,2]     [,3]
[1,] 30 66 102
[2,] 36 81 126
[3,] 42 96 150

Converted to Rcpp




# Rcpp code to multiply two matrices
library(Rcpp)
 
cppFunction(
  'NumericMatrix multiplyMatricesC(NumericMatrix A, NumericMatrix B) {
        int n = A.nrow(), k = A.ncol(), m = B.ncol();
        NumericMatrix C(n, m);
        for(int i = 0; i < n; ++i) {
        for(int j = 0; j < m; ++j) {
            for(int l = 0; l < k; ++l) {
            C(i, j) += A(i, l) * B(l, j);
            }
        }
        }
        return C;
    }')
 
# Define a 3x3 matrix
A <- matrix(1:9, nrow = 3, ncol = 3)
B <- matrix(1:9, nrow = 3, ncol = 3)
 
# Multiply the matrices using the R function
multipleMatricesR(A, B)

Output

     [,1] [,2] [,3]
[1,] 30 66 102
[2,] 36 81 126
[3,] 42 96 150

The other way of using Rcpp .i.e is by using standalone C++ files and importing them in R script using sourceCpp.

Using sourceCpp()

The sourceCpp(“Path to cpp file”) function allows us to separate the C++ code from the R script by sourcing the C++ code from a separate file. Instead of including the C++ code directly in the R file, we can provide the path to the C++ file to the sourceCpp() function.

This approach promotes modularity and organization, making it easier to manage and maintain the C++ code separately from the R script. By sourcing the C++ code using sourceCpp(), the C++ functions defined in the file become accessible within the R environment, enabling us to utilize their functionality seamlessly within our R code.

Example

We have seen vectors and matrices, let’s work with data frames. I am creating a function to count the number of each grade given to a student.




# Create a data frame with id, name and marks
df <- data.frame(
  id = c(1, 2, 3, 4, 5, 6),
  name = c("John", "Smith", "Jane", "Doe", "Peter", "Parker"),
  grades = c("A", "B", "A", "C", "D", "C")
)
 
 
# Function to count the number of grades
count_grades <- function(df) {
  grades <- df$grades
  unique_grades <- unique(grades)
  counts <- c()
  for (grade in unique_grades) {
    count <- length(grades[grades == grade])
    counts <- c(counts, count)
  }
  return(data.frame(grade = unique_grades, count = counts))
}
 
count_grades(df)

Output:

  grade count
1 A 2
2 B 1
3 C 2
4 D 1

Let’s go through the code line by line:

  1. include <Rcpp.h>: This line includes the Rcpp library, which allows seamless integration of C++ code with R.
  2. include <map>: This line includes the standard C++ library for std::map, which will be used to store the count of grades.
  3. using namespace Rcpp;: This line brings the Rcpp namespace into the current scope, so we can use Rcpp functions and classes without explicitly specifying the namespace.
  4. DataFrame count_grades_in_cpp(DataFrame df): This line defines the function count_grades_in_cpp, which takes a DataFrame df as input and returns a DataFrame as output.
  5. std::map<std::string, int> grade_count;: This line declares a map called grade_count, where the key is a std::string representing the grade name, and the value is an int representing the count of that grade.
  6. int n = df.nrows();: This line gets the number of rows in the input DataFrame df and stores it in the variable n.
  7. CharacterVector grades = df["grades"];: This line extracts the column named “grades” from the input DataFrame df and stores it in a CharacterVector called grades.
  8. for (int i = 0; i < n; i++) { ... }: This is a loop that iterates over each row of the grades vector.
  9. grade_count[as<std::string>(grades[i])]++;: This line converts the i-th element of the grades vector to a std::string using as<std::string>() and then uses it as a key to access the corresponding value in the grade_count map. The value is then incremented by one.
  10. CharacterVector grade_name;: This line declares an empty CharacterVector called grade_name, which will store the grade names for the output DataFrame.
  11. IntegerVector grade_count_vec;: This line declares an empty IntegerVector called grade_count_vec, which will store the grade counts for the output DataFrame.
  12. for (auto it = grade_count.begin(); it != grade_count.end(); it++) { ... }: This loop iterates over the grade_count map using an iterator it.
  13. grade_name.push_back(it->first);: This line adds the key (grade name) of the current element pointed to by the iterator it to the grade_name vector.
  14. grade_count_vec.push_back(it->second);: This line adds the value (grade count) of the current element pointed to by the iterator it to the grade_count_vec vector.
  15. return DataFrame::create(_["grade"] = grade_name, _["count"] = grade_count_vec);: This line creates a new DataFrame using DataFrame::create() with two named columns: “grade” (using the grade_name vector) and “count” (using the grade_count_vec vector). The created DataFrame is then returned as the output of the function.

Now, create a R script in which we will import dot productdot product.cpp using sourceCpp(“dotproduct.cpp”)




# Create a data frame with id, name and marks
df <- data.frame(
  id = c(1, 2, 3, 4, 5, 6),
  name = c("John", "Smith", "Jane", "Doe", "Peter", "Parker"),
  grades = c("A", "B", "A", "C", "D", "C")
)
 
library(Rcpp)
sourceCpp("grades.cpp")
 
# Call the function
count_grades_in_cpp(df)

Output

  grade count
1 A 2
2 B 1
3 C 2
4 D 1

Here we use sourceCpp(“dotproduct.cpp”) to import the C++ code from the “dotproduct.cpp” file, which contains the implementation of the dotProductCpp function. The function becomes available in the R environment after sourcing the file. Then we call the function directly in our R script using dotProductCpp(x,y).

Note

The dotproduct.cpp should be replaced with the path of dotproduct.cpp file .i.e. sourceCpp( PATH TO C++ FILE).

Benchmarking

Two understand the benefits of using C++ equivalent we’ll benchmark both functions sumTwoR and sumTwoC, using package microbenchmark. Install it using install.packages(“microbenchmark”).




library(Rcpp)
library(microbenchmark)
 
# Writing a code in C++ to calculate the sum of vector values
cppFunction('
  double sumTwoC(NumericVector x, NumericVector y) {
    int n = x.size();
    double total = 0;
    for(int i = 0; i < n; ++i) {
      total += x[i] + y[i];
    }
    return total;
  }')
 
 
# R code to add two vectors
sumTwoR <- function(x, y) {
    total <- 0
    for(i in 1:length(x)) {
        total <- total + x[i] + y[i]
    }
    return(total)
}
 
# Benchmarking the two functions
x <- runif(1e6)
y <- runif(1e6)
 
 
# Create a horizontal bar plot of the results
benchmark <- microbenchmark(sumTwoC(x, y), sumTwoR(x, y), times = 100)

Unit: milliseconds
expr min lq mean median uq max neval
sumTwoC(x, y) 1.100829 1.142826 1.250264 1.185279 1.231635 2.252383 100
sumTwoR(x, y) 44.789824 46.995613 49.763670 48.157485 50.708915 73.723904 100


Here we can see that the minimum time taken by sumTwoC is 2.25 ms but sumTwoR takes 44.78 ms, it’s a huge difference.

Plotting the benchmark results

I’m using ggplot2 for plotting the benchmark results.




# Plotting the results
library(ggplot2)
ggplot(benchmark, aes(x = expr, y = time, fill = expr)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  coord_flip() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), axis.text = element_text(face = "bold")) +
  labs(x = "Function", y = "Time (ms)", title = "Benchmarking sumTwo and sumTwoR")
 
# Save the plot
ggsave("benchmark.png", width = 10, height = 5, dpi = 100)

Output

Benchmark results

Conclusion

We have learned that Rcpp is a powerful tool that can be used to implement R code in C++. This can lead to significant performance improvements, as C++ is a compiled language that is typically much faster than R. To learn more you can check out the official website of Rcpp.


Article Tags :