Shrinkage Covariance Estimation in Scikit Learn
Last Updated :
19 Jan, 2023
The Ledoit and Wolf proposed a formula for shrinkage which is generally used for regularizing the usual maximum likelihood estimation. This formula is called the Ledoit-Wolf covariance estimation formula. This formula is able to compute asymptotically optimal shrinkage parameters by minimizing the mean-square-error criterion features.
OAS Estimator: Researcher Chen et al. proposed an improvement on the Ledoit-Wolf shrinkage parameter with Oracle Approximating Shrinkage(OAS) estimator. This estimator’s convergence is significantly better but the assumption is that the data are Gaussian.
Importing Libraries and generating datasets
By using python libraries like NumPy, Matplotlib, SKlearn, and SciPy it will become easy to handle the datasets and perform complex computations with a single line of code.
Python3
from sklearn.covariance import ShrunkCovariance,\
empirical_covariance, log_likelihood
from scipy import linalg
from sklearn.model_selection import GridSearchCV
from sklearn.covariance import LedoitWolf, OAS
import matplotlib.pyplot as plt
import numpy as np
noOfFeatures, noOfSamples = 38 , 22
np.random.seed( 50 )
X_train_baseline = np.random.normal(size = (noOfSamples,
noOfFeatures))
X_test_baseline = np.random.normal(size = (noOfSamples,
noOfFeatures))
colorMatrix = np.random.normal(size = (noOfFeatures,
noOfFeatures))
X_train = np.dot(X_train_baseline, colorMatrix)
X_test = np.dot(X_test_baseline, colorMatrix)
|
Defining the Range of shrinkage values and making it optimal
Now we will define a span of all possible shrinkage coefficient values and perform the Grid-search method to identify optimal shrinkage coefficient values.
Python3
shrinkageFactor = np.logspace( - 2 , 0 , 32 )
negative_logliks = [
- ShrunkCovariance(shrinkage = s).fit(X_train).score(X_test)\
for s in shrinkageFactor]
realCovariance = np.dot(colorMatrix.T, colorMatrix)
empiricalCovariance = empirical_covariance(X_train)
logRealLikelihood = - \
log_likelihood(empiricalCovariance, linalg.inv(realCovariance))
tunedParameters = [{ "shrinkage" : shrinkageFactor}]
cv = GridSearchCV(ShrunkCovariance(), tunedParameters)
cv.fit(X_train)
|
Performing optimal shrinkage coefficient estimation for Ledoit-Wolf and OAS coefficient
Now We will estimate the shrinkage coefficients for Ledoit-Wolf and OAS. But before that let’s see what is max-likelihood estimation.
Max-Likelihood Estimation: It is an optimization algorithm that searches for the most suitable parameters. It iteratively searches the most likely mean and standard deviation(sd) that may be generated in the distribution. So, it is a probabilistic approach that can be applied to data belonging to any distribution.
Python3
ledoitWolf = LedoitWolf()
logLikelihoodLedoitWolf = ledoitWolf.fit(X_train).score(X_test)
oas = OAS()
logLikelihoodOAS = oas.fit(X_train).score(X_test)
fig = plt.figure()
|
Defining Shrinkage Curve Range
Now we will define a range of the shrinkage curve and adjust the view of the graph to see the output in an easily understandable manner. Finally, we will calculate the likelihood estimation for Ledoit-Wolf, OAS, and also the Best Covariance estimator to visualize the actual comparative results.
Python3
plt.loglog(shrinkageFactor,
negative_logliks,
"m--" ,
label = "Negative log-likelihood" )
plt.plot(plt.xlim(),
2 * [logRealLikelihood],
"b-." ,
label = "Real Covariance Likelihood" )
maxLikelihood = np.amax(negative_logliks)
minLikelihood = np.amin(negative_logliks)
min_y = minLikelihood - 7.0 * np.log((plt.ylim()[ 1 ] - plt.ylim()[ 0 ]))
max_y = maxLikelihood + 16.0 * np.log(maxLikelihood - minLikelihood)
min_x = shrinkageFactor[ 0 ]
max_x = shrinkageFactor[ - 1 ]
plt.vlines(
ledoitWolf.shrinkage_,
min_y,
- logLikelihoodLedoitWolf,
color = "cyan" ,
linewidth = 3 ,
label = "Ledoit-Wolf Estimate" ,
)
plt.vlines(
oas.shrinkage_,
min_y,
- logLikelihoodOAS,
color = "green" ,
linewidth = 3 ,
label = "OAS Estimate"
)
plt.vlines(
cv.best_estimator_.shrinkage, min_y,
- cv.best_estimator_.score(X_test),
color = "yellow" ,
linewidth = 3 ,
label = "Cross-validation Best estimatation" ,
)
plt.title( "Regularized Covariance: Likelihood & Shrinkage Coefficient" )
plt.xlabel( "Regularization parameter: Shrinkage coefficient" )
plt.ylabel( "Error calculation in negative log-likelihood on test-data" )
plt.ylim(min_y, max_y)
plt.xlim(min_x, max_x)
plt.legend()
plt.show()
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...