• https://me.yahoo.com

Least Squares

Curve fitting, least squares, optimization

Introduction

Consider a series of $\inline&space;N$ given data points in $\inline&space;(d&space;+&space;1)$-dimensional space, $\inline&space;(x_1,&space;y_1),&space;(x_2,&space;y_2),&space;\ldots,&space;(x_N,&space;y_N)$, where $\inline&space;x_i&space;:=&space;(x_i^1,&space;x_i^2,&space;\ldots,&space;x_i^d)$ is a vector, for all $\inline&space;i&space;=&space;1,&space;2,&space;\ldots,&space;N$. Define a function $\inline&space;f&space;:&space;\mathbb{R}^{k&space;+&space;d}&space;\to&space;\mathbb{R}$ through

$f(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k;&space;x)&space;:=&space;&space;\psi_1(\alpha_1)&space;\theta_1(x)&space;+&space;\psi_2(\alpha_2)&space;\theta_2(x)&space;&space;+&space;\ldots&space;+&space;\psi_k(\alpha_k)&space;\theta_k(x)$

where $\inline&space;\psi_j&space;:&space;\mathbb{R}&space;\to&space;\mathbb{R}$ and $\inline&space;\theta_j&space;:&space;\mathbb{R}^d&space;\to&space;\mathbb{R}$ are given functions, for all $\inline&space;j&space;=&space;1,&space;2,&space;\ldots,&space;k$. Also, define the function $\inline&space;S_f&space;:&space;\mathbb{R}^k&space;\to&space;\mathbb{R}$ as the sum of squared residuals:

$S_f(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k)&space;:=&space;&space;\sum_{i&space;=&space;1}^N&space;\left[y_i&space;-&space;f(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k;&space;x_i)\right]^2.$

The method of least squares fitting refers to finding the parameters $\inline&space;(\alpha_1^*,&space;\alpha_2^*,&space;\ldots,&space;\alpha_k^*)$ which are the solution to the following unconstrained optimization problem:

$\mbox{Minimize&space;}&space;S_f(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k),&space;&space;\quad&space;&space;\mbox{where&space;}&space;(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k)&space;\in&space;\mathbb{R}^k.$

After solving this problem, the function $\inline&space;f^*&space;:&space;\mathbb{R}^d&space;\to&space;\mathbb{R}$ which provides the best fit, in the least-squares sense, is given by:

$f^*(x)&space;=&space;\psi_1(\alpha_1^*)&space;\phi_1(x)&space;+&space;\psi_2(\alpha_2^*)&space;\phi_2(x)&space;+&space;\ldots&space;+&space;&space;\psi_k(\alpha_k^*)&space;\phi_k(x).$

This general framework allows us to classify the different types of regression, as follows:

• When $\inline&space;d&space;=&space;1$ the method is known as univariate regression, while if $\inline&space;d&space;>&space;1$ we have multivariate regression.
• Provided that all the functions $\inline&space;\psi_1,&space;\psi_2,&space;\ldots,&space;\psi_k$ are linear, the method is called linear regression, otherwise it is known as nonlinear regression.
• Also, based on the type of the functions $\inline&space;\phi_1,&space;\phi_2,&space;\ldots,&space;\phi_k$ we may have polynomial regression, regression by orthogonal polynomials, and so on.

Solving The Problem

Since the sum of residuals function $\inline&space;S_f$ is convex on its entire domain, a necessary and sufficient condition for a tuple of parameters $\inline&space;(\alpha_1^*,&space;\alpha_2^*,&space;\ldots,&space;\alpha_k^*)$ to be a solution to the above optimization problem is that

$\nabla&space;S_f(\alpha_1^*,&space;\alpha_2^*,&space;\ldots,&space;\alpha_k^*)&space;=&space;0,$

which can also be written as the system of (possibly nonlinear) equations:

$\frac{\partial&space;S_f}{\partial&space;\alpha_1}(\alpha_1^*,&space;\alpha_2^*,&space;\ldots,&space;\alpha_k^*)&space;=&space;0,&space;&space;\quad\ldots\quad&space;&space;\frac{\partial&space;S_f}{\partial&space;\alpha_k}(\alpha_1^*,&space;\alpha_2^*,&space;\ldots,&space;\alpha_k^*)&space;=&space;0.$

Let us fix some value of $\inline&space;j&space;\in&space;\{1,&space;2,&space;\ldots,&space;k\}$ and calculate $\inline&space;\displaystyle&space;\frac{\partial&space;S_f}{\partial&space;\alpha_j}$. We have:

$\begin{array}{rcl}&space;&space;\displaystyle&space;\frac{\partial&space;S_f}{\partial&space;\alpha_j}&space;&space;&=&&space;\displaystyle&space;&space;&space;\sum_{i&space;=&space;1}^N&space;\frac{\partial}{\partial&space;\alpha_j}&space;&space;\left[y_i&space;-&space;f(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k;&space;x_i)\right]^2&space;&space;\\&space;\\&space;&=&&space;\displaystyle&space;&space;&space;2&space;\sum_{i&space;=&space;1}^N&space;\left[y_i&space;-&space;f(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k;&space;x_i)\right]&space;&space;\frac{\partial}{\partial&space;\alpha_j}&space;&space;\left[y_i&space;-&space;f(\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k;&space;x_i)\right]&space;&space;\\&space;\\&space;&=&&space;\displaystyle&space;&space;&space;2&space;\sum_{i&space;=&space;1}^N&space;\left[y_i&space;-&space;\sum_{r&space;=&space;1}^k&space;\psi_r(\alpha_r)&space;\phi_r(x_i)&space;\right]&space;&space;\frac{\partial}{\partial&space;\alpha_j}&space;&space;\left[y_i&space;-&space;\sum_{r&space;=&space;1}^k&space;\psi_r(\alpha_r)&space;\phi_r(x_i)&space;\right]&space;&space;\\&space;\\&space;&=&&space;\displaystyle&space;&space;&space;-&space;2&space;\psi'_j(\alpha_j)&space;\sum_{i&space;=&space;1}^N&space;\phi_j(x_i)&space;&space;\left[y_i&space;-&space;\sum_{r&space;=&space;1}^k&space;\psi_r(\alpha_r)&space;\phi_r(x_i)&space;\right].&space;&space;\end{array}$

Therefore, the system of equations whose solution is the tuple of optimal parameters $\inline&space;(\alpha_1^*,&space;\alpha_2^*,&space;\ldots,&space;\alpha_k^*)$ can explicitly be written as:

$\left\{&space;&space;\begin{array}{ccc}&space;&space;\displaystyle&space;&space;\psi'_1(\alpha_1^*)&space;\sum_{i&space;=&space;1}^N&space;\phi_1(x_i)&space;&space;\left[y_i&space;-&space;\sum_{r&space;=&space;1}^k&space;\psi_r(\alpha_r^*)&space;\phi_r(x_i)&space;\right]&space;&=&&space;0&space;\\&space;\\&space;&space;\displaystyle&space;&space;\psi'_2(\alpha_2^*)&space;\sum_{i&space;=&space;1}^N&space;\phi_2(x_i)&space;&space;\left[y_i&space;-&space;\sum_{r&space;=&space;1}^k&space;\psi_r(\alpha_r^*)&space;\phi_r(x_i)&space;\right]&space;&=&&space;0&space;\\&space;\\&space;&space;\cdots\qquad\cdots\qquad\cdots\\&space;&space;\displaystyle&space;&space;&space;&space;\psi'_k(\alpha_k^*)&space;\sum_{i&space;=&space;1}^N&space;\phi_k(x_i)&space;&space;\left[y_i&space;-&space;\sum_{r&space;=&space;1}^k&space;\psi_r(\alpha_r^*)&space;\phi_r(x_i)&space;\right]&space;&=&&space;0.&space;\\&space;\\&space;&space;\end{array}&space;&space;\right.$

In the case of linear regression in small dimensions, it is possible to solve this system directly, using algebra. Generally, however, we should use numerical root-finding techniques to find the optimal parameters.
Lucian Bentea (September 2008)