Least Squares

Curve fitting, least squares, optimization

View version details

Contents

Introduction
Solving The Problem
References
Page Comments

Introduction

Consider a series of $\inline N$ given data points in $\inline (d + 1)$ -dimensional space, $\inline (x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N)$ , where $\inline x_i := (x_i^1, x_i^2, \ldots, x_i^d)$ is a vector, for all $\inline i = 1, 2, \ldots, N$ . Define a function $\inline f : \mathbb{R}^{k + d} \to \mathbb{R}$ through

$f(\alpha_1, \alpha_2, \ldots, \alpha_k; x) := \psi_1(\alpha_1) \theta_1(x) + \psi_2(\alpha_2) \theta_2(x) + \ldots + \psi_k(\alpha_k) \theta_k(x)$

(2)

where $\inline \psi_j : \mathbb{R} \to \mathbb{R}$ and $\inline \theta_j : \mathbb{R}^d \to \mathbb{R}$ are given functions, for all $\inline j = 1, 2, \ldots, k$ . Also, define the function $\inline S_f : \mathbb{R}^k \to \mathbb{R}$ as the sum of squared residuals:

$S_f(\alpha_1, \alpha_2, \ldots, \alpha_k) := \sum_{i = 1}^N \left[y_i - f(\alpha_1, \alpha_2, \ldots, \alpha_k; x_i)\right]^2.$

(3)

The method of least squares fitting refers to finding the parameters $\inline (\alpha_1^*, \alpha_2^*, \ldots, \alpha_k^*)$ which are the solution to the following unconstrained optimization problem:

$\mbox{Minimize } S_f(\alpha_1, \alpha_2, \ldots, \alpha_k), \quad \mbox{where } (\alpha_1, \alpha_2, \ldots, \alpha_k) \in \mathbb{R}^k.$

(4)

After solving this problem, the function $\inline f^* : \mathbb{R}^d \to \mathbb{R}$ which provides the best fit, in the least-squares sense, is given by:

$f^*(x) = \psi_1(\alpha_1^*) \phi_1(x) + \psi_2(\alpha_2^*) \phi_2(x) + \ldots + \psi_k(\alpha_k^*) \phi_k(x).$

(5)

This general framework allows us to classify the different types of regression, as follows:

When $\inline d = 1$ the method is known as univariate regression, while if $\inline d > 1$ we have multivariate regression.
Provided that all the functions $\inline \psi_1, \psi_2, \ldots, \psi_k$ are linear, the method is called linear regression, otherwise it is known as nonlinear regression.
Also, based on the type of the functions $\inline \phi_1, \phi_2, \ldots, \phi_k$ we may have polynomial regression, regression by orthogonal polynomials, and so on.

Solving The Problem

Since the sum of residuals function $\inline S_f$ is convex on its entire domain, a necessary and sufficient condition for a tuple of parameters $\inline (\alpha_1^*, \alpha_2^*, \ldots, \alpha_k^*)$ to be a solution to the above optimization problem is that

$\nabla S_f(\alpha_1^*, \alpha_2^*, \ldots, \alpha_k^*) = 0,$

(6)

which can also be written as the system of (possibly nonlinear) equations:

$\frac{\partial S_f}{\partial \alpha_1}(\alpha_1^*, \alpha_2^*, \ldots, \alpha_k^*) = 0, \quad\ldots\quad \frac{\partial S_f}{\partial \alpha_k}(\alpha_1^*, \alpha_2^*, \ldots, \alpha_k^*) = 0.$

(7)

Let us fix some value of $\inline j \in \{1, 2, \ldots, k\}$ and calculate $\inline \displaystyle \frac{\partial S_f}{\partial \alpha_j}$ . We have:

$\begin{array}{rcl} \displaystyle \frac{\partial S_f}{\partial \alpha_j} &=& \displaystyle \sum_{i = 1}^N \frac{\partial}{\partial \alpha_j} \left[y_i - f(\alpha_1, \alpha_2, \ldots, \alpha_k; x_i)\right]^2 \\ \\ &=& \displaystyle 2 \sum_{i = 1}^N \left[y_i - f(\alpha_1, \alpha_2, \ldots, \alpha_k; x_i)\right] \frac{\partial}{\partial \alpha_j} \left[y_i - f(\alpha_1, \alpha_2, \ldots, \alpha_k; x_i)\right] \\ \\ &=& \displaystyle 2 \sum_{i = 1}^N \left[y_i - \sum_{r = 1}^k \psi_r(\alpha_r) \phi_r(x_i) \right] \frac{\partial}{\partial \alpha_j} \left[y_i - \sum_{r = 1}^k \psi_r(\alpha_r) \phi_r(x_i) \right] \\ \\ &=& \displaystyle - 2 \psi'_j(\alpha_j) \sum_{i = 1}^N \phi_j(x_i) \left[y_i - \sum_{r = 1}^k \psi_r(\alpha_r) \phi_r(x_i) \right]. \end{array}$

(8)

Therefore, the system of equations whose solution is the tuple of optimal parameters $\inline (\alpha_1^*, \alpha_2^*, \ldots, \alpha_k^*)$ can explicitly be written as:

$\left\{ \begin{array}{ccc} \displaystyle \psi'_1(\alpha_1^*) \sum_{i = 1}^N \phi_1(x_i) \left[y_i - \sum_{r = 1}^k \psi_r(\alpha_r^*) \phi_r(x_i) \right] &=& 0 \\ \\ \displaystyle \psi'_2(\alpha_2^*) \sum_{i = 1}^N \phi_2(x_i) \left[y_i - \sum_{r = 1}^k \psi_r(\alpha_r^*) \phi_r(x_i) \right] &=& 0 \\ \\ \cdots\qquad\cdots\qquad\cdots\\ \displaystyle \psi'_k(\alpha_k^*) \sum_{i = 1}^N \phi_k(x_i) \left[y_i - \sum_{r = 1}^k \psi_r(\alpha_r^*) \phi_r(x_i) \right] &=& 0. \\ \\ \end{array} \right.$

(9)

In the case of linear regression in small dimensions, it is possible to solve this system directly, using algebra. Generally, however, we should use numerical root-finding techniques to find the optimal parameters.

Lucian Bentea (September 2008)

References

http://en.wikipedia.org/wiki/Curve_fitting
http://en.wikipedia.org/wiki/Regression_analysis
http://en.wikipedia.org/wiki/Least_squares
Franklin A. Graybill, Hariharan K. Iyer, Regression Analysis. Concepts and Applications, Duxbury Press, Belmont, California.