Components of Linear Algebra
Linear algebra is actually branch of mathematics but it is the mathematics of data that provides mathematical framework for solving problems. This framework is especially required to solve the problems that are related to physics, engineering, maths, etc. It is very important and essential for learning and understanding machine algorithms without which we cannot fully and deeply understand machine learning. It is not very hard to understand linear algebra. The understanding of linear algebra will make us develop better solving solution.
In simple words, if you want to learn about machine learning, firstly you need to have basic understanding of linear algebra. To represent linear equations, it uses vectors and matrices. Various components of Linear Algebra are given below –
- Dataset and Data Files
- Principle Component Analysis
- Linear Regression
- Singular-Value Decomposition
- One Hot Encoding
Let us understand these components in details –
- Dataset and Data Files –
Dataset simply means collection of data in Machine Learning. Sometimes, data is presented in tabular form. In ML, we basically fit model on this tabular form dataset. After fitting model, it represents table-like set of numbers where each row indicates operation and each column indicates features of that particular operation.
So we can conclude that data is vectorized and vector is basically another data structure in linear algebra.
- Regularization –
While working with data in ML, lot of problems and issues arises that can affect working and accuracy of ML model. To solve these problems and issues, several techniques and methods are used. Regularization is one of those techniques that is used to solve problems like overfitting.
Overfitting is one of common problems that arise and causes errors. Therefore this technique is basically used to reduce errors that arise by this overfitting any of functions in inappropriate manner on given training set.
By Overfitting, we mean that some ML model includes some of data points that are not useful and are not anyhow related to actual data. Due to this overfitting, ML model shows results that are not much accurate and it also makes it difficult, increases complexity, and decreases performance of ML model. Therefore, regularization is very much needed.
There are basically two types of regularization i.e., L1 Regularization also known as Lasso regression and L2 Regularization also known as Ridge regression.
- Principle Component Analysis –
Principle component analysis is tool that is simply used to reduce dimensions i.e., features of data to either 2D or 3D format. It is one of simplest and easiest ways of doing dimensionality reduction. This dimensionality reduction has one disadvantage that it reduces accuracy of result.
But it has several advantages like it makes large complex data very simpler, easy to explore, and imagine or visualize. It also increases performance of ML model by reducing complexity. This complexity is reduced by converting large dataset of variables into smaller one. It is important technique to understand in data science field of statistics.
- Linear Regression –
Linear regression is also known as multiple regression or OLS (ordinary least square) is one of simplest and easiest ML algorithms for users, to begin with, learning ML Algorithm. It is statistical method that is used for describing and investigating whether or not one variable is dependent on others or whether it is independent.
It shows linear relationship between dependent variable and one or more independent variables that simply means it determines value of dependent variable that varies continuously according to value of independent variable.
This linear relationship is shown using regression line. It is branch of mathematics that generally deals with matrices and vectors.
For solving linear regression problems or issues, there are many ways available and are used nowadays. Best way of solving linear regression is by using least square optimization. Main aim of linear regression is simply to identify most appropriate fit line that is passing through continuous data by just making use of simple mathematical formula or criteria.
- Singular-Value Decomposition –
It is usually referred to as SVD. Singular-Value Decomposition method is technique used for dimensionality reduction but it is not linked to any particular statistical method. It simply states that rectangular matrix A can be broken down into three products of matrices i.e. Orthogonal matrix (U), diagonal matrix (S), and transpose of orthogonal matrix (V).
This matrix decomposition is simply done to reduce and make matrix calculations easier for user. It has wide range of use in data science, engineering, etc. It also provides some essential and vital geometrical and theoretical data about linear transformation. It is used in many applications such as least-square linear regression, image compression, feature selection, visualization, denoising data, etc.
- One Hot Encoding –
One Hot Encoding is also known as dummy encoding is most commonly used encoding method for categorical data. It converts categorical data into form that can be provided ML algorithms to do better performance. This encoding method transforms categorical data into numeric values before using in ML algorithm because ML algorithm cannot work directly with categorical data.
Therefore, this one hot encoding method is used as preprocessing step and is used to encode categorical data. Here, categorical data or value is used to represent numerical value of entry or input in dataset.