Non-Linear Regressions with Caret Package in R

Last Updated : 12 Jun, 2023

Non-linear regression is used to find or analyze non-linear relations between dependent and independent parameters known as non-linear regression. Non-linear regression models are able to capture more intricate correlations between variables than linear regression.

Compared to a linear model, it is more precise and adaptable. The model can take into account a variety of curves that derive intricate relationships between two or more variables. Exponential, logarithmic, power and polynomial relationships are a few types of non-linear relationships that can be represented using non-linear regression.
In fitting the non-linear regression model, we try to minimize the sum of squared error between the predicted values and the actual value we got from our model. This entails choosing appropriate model parameter starting values and incrementally modifying them to reduce the squared error sum.
Bayesian techniques and maximum likelihood estimation are other non-linear regression techniques. Finding the model parameters that maximize the likelihood of the observed data involves maximum likelihood estimation. With Bayesian approaches, the model is informed by a previous understanding of the parameters, which also helps to estimate the posterior distribution of the parameters.
For implementing our Non-linear regression models we can do this by using various software packages, including R Programming Language, Python, and MATLAB. These packages provide built-in functions and libraries for fitting non-linear models and evaluating their performance.

Caret Package in R

A potent tool for developing and testing machine learning models is the R caret package. “Classification and Regression Training” or “Caret” stands for “Classification and Regression Training,” and it offers a standardized interface for creating and contrasting various model types, such as linear and non-linear models, decision trees, random forests, support vector machines, and neural networks.

A variety of features, including data pre-processing, cross-validation, model tuning, and variable selection, are offered by the caret package. In addition, a number of performance metrics, including accuracy, precision, recall, F1 score, and AUC, are included for assessing the model quality.

The caret package has the following salient characteristics:

Data cleaning: transformation and scaling techniques are provided by the caret package. These tools include centering and scaling, imputation, outlier detection, and feature selection.
Cross-validation: The caret package supports a variety of cross-validation techniques, including k-fold, repeated k-fold, and leave-one-out cross-validation.
Model tuning: tools are provided by the caret package, which uses techniques like grid search, random search, and Bayesian optimization.
Model Comparison: Using a variety of performance criteria, the caret package makes it simple to compare distinct models.
Processing in parallel: The caret package offers tools for parallel processing, enabling quicker model training and evaluation.

Non-linear regression using Caret in R

First, we load the ggplot2 and caret R packages that we’ll be using in the sample. Machine learning models are trained and evaluated using the caret package, while data visualizations are produced using the ggplot2 program.

R

# Load the required libraries 
library(caret) 
library(ggplot2)

The dataset we’ll utilize for our example is then loaded. In this instance, we’re utilizing R’s built-in “Iris” dataset. This could be changed for any other dataset of your choosing.

R

# Load the dataset (Iris dataset) 
data(iris) 

The non-linear regression model that will be used in our case is defined in this phase. The model is defined by the nls function, which stands for “non-linear least squares”. Our formula, states that we are predicting Petal.Length from Sepal.Length, is Petal.Length

a * Sepal.Length2 + b * Sepal.Length + c

We employ a non-linear model with three variables (a, b, and c), each of which has a starting value of 0.01, 0.1, and 1 turn. The only argument required by the function is sepal length, which provides the length of the sepals we wish to use for prediction. The predict function is then used by the function to produce a petal length estimate based on the non-linear regression model we previously created.

R

# Define the non-linear regression model 
model <- nls(Petal.Length ~ a*Sepal.Length^2 + b*Sepal.Length + c, 
             data = iris,  
             start = list(a = 0.01, 
                          b = 0.1, c = 1)) 
  
# Define a function to predict  
# petal length from sepal length 
predict_petallength <- function(sepal_length) { 
  predict(model, newdata = data.frame(Sepal.Length = sepal_length)) 
} 

The caret package is used in this stage to train and assess the non-linear regression model. To train and test the model, we are using 10-fold cross-validation (method = “cv”, number = 10). To train the model, we’re utilizing the caret package’s train function. The predict petal length function we defined before is used to predict Petals. Length and linear regression (method = “lm”) is used to model the relationship between petals.Length and predict petal length. In order to regulate the training process, we are also specifying the train control object that we defined earlier and then printing the model result.

R

# Train and evaluate the model using the caret package 
train_control <- trainControl(method = "cv", number = 10) 
model_caret <- train(Petal.Length ~ predict_petallength(Sepal.Length), 
                     data = iris, method = "lm", 
                     trControl = train_control) 
  
# Print the model results 
summary(model_caret)

Output:

Call:
lm(formula = .outcome ~ ., data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6751 -0.5138  0.1218  0.5356  2.6287 

Coefficients:
                                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         -1.654e-08  1.788e-01    0.00        1    
`predict_petallength(Sepal.Length)`  1.000e+00  4.399e-02   22.73   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8357 on 148 degrees of freedom
Multiple R-squared:  0.7774,    Adjusted R-squared:  0.7759 
F-statistic: 516.8 on 1 and 148 DF,  p-value: < 2.2e-16

We now create a scatter plot of the Iris dataset with a non-linear regression line superimposed on top of the dots using three functions. Our non-linear regression model’s projected link between sepal and petal length is depicted by the red line. Although there are several places that deviate greatly from the line, we can see that the model generally fits the data

R

# Plot the results 
library(ggplot2) 
  
ggplot(iris, aes(Sepal.Length, Petal.Length)) 
    + geom_point() 
    + stat_function(fun = predict_petallength, 
                    color = "red", size = 1)