## Introduction

There universally exists a relationship among variables. Indeed, the relationship can be divided into two categories, namely, certainty relation and uncertainty relation. The certainty relation can be expressed with a function. The certainty relation is also called correlation, which can be studied with regression analysis.

Generally, the linear regression model is:

The optimal can be determined by minimum the loss function:

## Regression Model

Linear regression consists of linear regression, local weighted linear regression, ridge regression, Lasso regression and stepwise linear regression.

### Linear Regression

The parameter for linear regression can be calculated by gradient descent method or **regular expression**. Because gradient descent method has been introduced in Step-by-Step Guide to Implement Machine Learning IV - Logistic Regression, we introduce the solution with regular expression in this article.

First, calculate the derivative of loss function:

Then, make the derivative equal to 0, we can obtain:

Finally, is:

where X is the training data and Y is the corresponding label. The code of linear regression is shown below:

def standardLinearRegression(self, x, y): if self.norm_type == "Standardization": x = preProcess.Standardization(x) else: x = preProcess.Normalization(x) xTx = np.dot(x.T, x) if np.linalg.det(xTx) == 0: # calculate the Determinant of xTx print("Error: Singluar Matrix !") return w = np.dot(np.linalg.inv(xTx), np.dot(x.T, y)) return w

### Local Weighted Linear Regression

It is underfitting in linear regression for it using the unbiased estimation of minimum mean square error(MMSE). To solve the problem, we assign weights on the points around the point to be predicted. Then, we apply normal regression analysis on it. The loss function for local weighted linear regression is:

Like linear regression, we calculate the derivative of loss function and make it equal to 0. The optimal is

The weights in local weighted linear regression is like the kernel function in SVM, which is given by:

The code of local weighted linear regression is shown below:

def LWLinearRegression(self, x, y, sample): if self.norm_type == "Standardization": x = preProcess.Standardization(x) else: x = preProcess.Normalization(x) sample_num = len(x) weights = np.eye(sample_num) for i in range(sample_num): diff = sample - x[i, :] weights[i, i] = np.exp(np.dot(diff, diff.T)/(-2 * self.k ** 2)) xTx = np.dot(x.T, np.dot(weights, x)) if np.linalg.det(xTx) == 0: print("Error: Singluar Matrix !") return result = np.dot(np.linalg.inv(xTx), np.dot(x.T, np.dot(weights, y))) return np.dot(sample.T, result)

### Ridge Regression

If the feature dimension is large, than the number of samples, the input matrix is not full rank, whose inverse matrix doesn't exist. To solve the problem, ridge regression add to make the matrix nonsingular. Actually, it is equal to add **L2 regularization** on the loss function for ridge regression, namely:

Like linear regression, we calculate the derivative of loss function and make it equal to 0. The optimal is:

The code of ridge regression is shown below:

def ridgeRegression(self, x, y): if self.norm_type == "Standardization": x = preProcess.Standardization(x) else: x = preProcess.Normalization(x) feature_dim = len(x[0]) xTx = np.dot(x.T, x) matrix = xTx + np.exp(feature_dim)*self.lamda if np.linalg.det(xTx) == 0: print("Error: Singluar Matrix !") return w = np.dot(np.linalg.inv(matrix), np.dot(x.T, y)) return w

### Lasso Regression

Like ridge regression, Lasso regression add** L1 regularization** on the loss function, namely:

Because the L1 regularization contains absolute value expression, the loss function is not derivable anywhere. Thus, we apply **coordinate descent method **(CD). The CD gets a minimum at a direction each iteration, namely,

We can get a closed solution for CD, which is given by:

where:

The code of Lasso regression is shown below:

def lassoRegression(self, x, y): if self.norm_type == "Standardization": x = preProcess.Standardization(x) else: x = preProcess.Normalization(x) y = np.expand_dims(y, axis=1) sample_num, feataure_dim = np.shape(x) w = np.ones([feataure_dim, 1]) for i in range(self.iterations): for j in range(feataure_dim): h = np.dot(x[:, 0:j], w[0:j]) + np.dot(x[:, j+1:], w[j+1:]) w[j] = np.dot(x[:, j], (y - h)) if j == 0: w[j] = 0 else: w[j] = self.softThreshold(w[j]) return w

### Stepwise Linear Regression

Stepwise linear regression is similar to Lasso, which applies greedy algorithm at each iteration to get a minimum rather than CD. Stepwise linear regression adds or cuts down a small part on the weights at each iteration. The code of stepwise linear regression is shown below:

def forwardstepRegression(self, x, y): if self.norm_type == "Standardization": x = preProcess.Standardization(x) else: x = preProcess.Normalization(x) sample_num, feature_dim = np.shape(x) w = np.zeros([self.iterations, feature_dim]) best_w = np.zeros([feature_dim, 1]) for i in range(self.iterations): min_error = np.inf for j in range(feature_dim): for sign in [-1, 1]: temp_w = best_w temp_w[j] += sign * self.learning_rate y_hat = np.dot(x, temp_w) error = ((y - y_hat) ** 2).sum() # MSE if error < min_error: # save the best parameters min_error = error best_w = temp_w w = best_w return w

## Conclusion and Analysis

There are many solutions to get the optimal parameter for linear regression. In this article, we only introduce some basic algorithms. Finally, let's compare our linear regression with the linear regression in Sklearn and the detection performance is displayed below:

Sklearn linear regression performance:

Our linear regression performance:

The performances look similar.

The related code and dataset in this article can be found in MachineLearning.

## History

- 28
^{th}May, 2019: Initial version