• https://me.yahoo.com

# Univariate Regression

Univariate regression, polynomial regression, orthogonal polynomials, nonlinear

## Overview

Univariate regression is an area of curve-fitting which, given a function $\inline&space;f$ depending on some parameters, finds the parameters such that $\inline&space;f$ provides the best fit to a series of two-dimensional data points, in a certain sense. It is called univariate as the data points are supposed to be sampled from a one-variable function. Compare this to multivariate regression, which aims at fitting data points sampled from a function of several variables.

Formally speaking, consider a series of $\inline&space;N$ data points $\inline&space;(x_1,&space;y_1),&space;(x_2,&space;y_2),&space;\ldots,&space;(x_N,&space;y_N)$ and, for the sake of simplicity, consider that $\inline&space;x_1&space;<&space;x_2&space;<&space;\ldots&space;<&space;x_N$, i.e. the points are distinct and are in increasing order with respect to $\inline&space;x$. By doing least squares fitting on these data points we mean finding the parameters $\inline&space;\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k$ of a function $\inline&space;f_{\displaystyle&space;\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k}&space;:&space;[x_1,&space;x_n]&space;\to&space;\mathbb{R}$ such that the sum of squared residuals

$\sum_{i&space;=&space;1}^N&space;\left[f_{\displaystyle&space;\alpha_1,&space;\alpha_2,&space;\ldots,&space;\alpha_k}(x_i)&space;-&space;y_i\right]^2$

is minimized. Provided that $\inline&space;f$ depends linearly on its parameters, the method is called linear regression, otherwise it is called nonlinear regression. For example, straight line regression, parabolic regression and polynomial regression are all linear regression models since the function $\inline&space;f$ is of the form

$f_{\displaystyle&space;\alpha_0,&space;\alpha_1,&space;\ldots,&space;\alpha_k}(x)&space;=&space;\alpha_0&space;+&space;\alpha_1&space;x&space;+&space;\alpha_2&space;x^2&space;+&space;\ldots&space;+&space;\alpha_k&space;x^k,$

which clearly depends linearly on its parameters. As opposed to this, logistic regression, for example, is a nonlinear regression model since the fitting function is of the form

$f_{\displaystyle&space;\alpha_1,&space;\alpha_2}(x)&space;=&space;\frac{1}{1&space;+&space;e^{\displaystyle&space;-\alpha_1&space;-&space;\alpha_2&space;x}}$

which is a nonlinear function of $\inline&space;\alpha_1$ and $\inline&space;\alpha_2$.

In the following, let us consider each case and briefly explain how the corresponding optimal parameters can be derived.

## Polynomial Regression

One linear regression method is that of polynomial regression, which refers to finding the polynomial function that provides the least squares fitting to a series of data points. More precisely, if the polynomial function $\inline&space;f$ of degree $\inline&space;k$ is given by

$f_{\displaystyle&space;\alpha_0,&space;\alpha_1,&space;\ldots,&space;\alpha_k}(x)&space;=&space;\alpha_0&space;+&space;\alpha_1&space;x&space;+&space;\alpha_2&space;x^2&space;+&space;\ldots&space;+&space;\alpha_k&space;x^k,$

then the optimal parameters $\inline&space;\alpha_0,&space;\alpha_1,&space;\ldots,&space;\alpha_k$ can be found by solving the following system of linear equations:

$\left&space;[&space;\begin{array}{lllllll}&space;N&space;&&space;&&space;\sum_{i=1}^N&space;x_i&space;&&&space;\ldots&space;&&space;&&space;\sum_{i=1}^N&space;x_i^k&space;\\&space;\\&space;\sum_{i=1}^N&space;x_i&space;&&space;&&space;\sum_{i=1}^N&space;x_i^2&space;&&&space;\ldots&space;&&space;&&space;\sum_{i=1}^N&space;x_i^{k+1}&space;\\&space;\\&space;\vdots&space;&&space;&&space;\vdots&space;&&space;&&space;\ddots&space;&&space;&&space;\vdots&space;\\&space;\\&space;\sum_{i=1}^N&space;x_i^k&space;&&space;&&space;\sum_{i=1}^N&space;x_i^{k+1}&space;&&space;&&space;\ldots&space;&&space;&&space;\sum_{i=1}^N&space;x_i^{2k}&space;\end{array}&space;\right]&space;\left[&space;\begin{array}{l}&space;\alpha_0\\&space;\\&space;\alpha_1\\&space;\\&space;\vdots\\&space;\\&space;\alpha_k&space;\end{array}&space;\right]&space;=&space;\left&space;[&space;\begin{array}{l}&space;\sum_{i=1}^N&space;y_i\\&space;\\&space;\sum_{i=1}^N&space;x_i&space;y_i\\&space;\\&space;\vdots\\&space;\\&space;\sum_{i=1}^N&space;x_i^k&space;y_i&space;\end{array}&space;\right]$

For $\inline&space;k&space;=&space;1$, the function $\inline&space;f$ only depends on two parameters

$f_{\displaystyle&space;\alpha_0,&space;\alpha_1}(x)&space;=&space;\alpha_0&space;+&space;\alpha_1&space;x$

and, in this case, the method is called straight line regression. The previous linear system becomes

$\left[&space;\begin{array}{lll}&space;N&space;&&space;&&space;\sum_{i=1}^N&space;x_i&space;\\&space;\\&space;\sum_{i=1}^N&space;x_i&space;&&space;&&space;\sum_{i=1}^N&space;x_i^2&space;\end{array}&space;\right]&space;\left[&space;\begin{array}{l}&space;\alpha_0&space;\\&space;\\&space;\alpha_1&space;\end{array}&space;\right]&space;=&space;\left[&space;\begin{array}{l}&space;\sum_{i=1}^N&space;y_i\\&space;\\&space;\sum_{i=1}^N&space;x_i&space;y_i&space;\end{array}&space;\right]$

which gives the optimal parameters (as its solution)

$\alpha_1&space;=&space;\frac{\sum_{i=1}^N&space;(x_i&space;-&space;\overline{x})(y_i&space;-&space;\overline{y})}{\sum_{i=1}^N&space;(x_i&space;-&space;\overline{x})},&space;&space;\qquad&space;&space;\alpha_0&space;=&space;\overline{y}&space;-&space;\alpha_1&space;\overline{x},$

where $\inline&space;\overline{x}$ and $\inline&space;\overline{y}$ are the averages:

$\overline{x}&space;:=&space;\frac{1}{N}&space;\sum_{i=1}^N&space;x_i,&space;\qquad&space;&space;\overline{y}&space;:=&space;\frac{1}{N}&space;\sum_{i=1}^N&space;y_i.$

For example, consider the set of 7 points:

$(1.5,&space;3.5),\&space;(2.4,&space;5.3),\&space;(3.2,&space;7.7),\&space;(4.8,&space;6.2),\&space;(5,&space;11),\&space;(7,&space;9.5),\&space;(8.43,&space;10.27)$

Then the optimal values of the parameters are

$\alpha_0&space;=&space;3.46212,&space;\quad&space;&space;\alpha_1&space;=&space;0.904273,$

which gives the fitting line $\inline&space;f(x)&space;=&space;3.46212&space;+&space;0.904273\&space;x$. The following graph shows the data points and the regression line.

Straight line regression is also implemented as the component Regression/Linear.

The optimal parameters for the case $\inline&space;k&space;=&space;2$ can be found in a similar manner. Instead of going into further technical details, let us look at the following graph which shows the quadratic regression polynomial in red for the series of data points shown in blue, sampled from a sine-like function.

Notice that the resulting fitting curve looks more like a parabola and this is due to the fact that we may have chosen the wrong family of functions to provide the fit (polynomial ones), or perhaps the degree of the regression polynomial is too small. Parabolic regression is available as Regression/Parabolic.

For the general case of regression polynomials of arbitrary degree, the component Regression/Discrete is available.

## Orthogonal Polynomial Regression

This is an extension of polynomial regression, in the sense that instead of using $\inline&space;x^0,&space;x^1,&space;x^2,&space;\ldots,&space;x^k$ as the factors in the fitting function $\inline&space;f$, some special kind of functions $\inline&space;\phi_0(x),&space;\phi_1(x),&space;\ldots,&space;\phi_k(x)$ are used and thus the expression for $\inline&space;f$ becomes:

$f_{\displaystyle&space;\alpha_0,&space;\alpha_1,&space;\ldots,&space;\alpha_k}(x)&space;=&space;&space;\alpha_0&space;\phi_0(x)&space;+&space;\alpha_1&space;\phi_1(x)&space;+&space;\ldots&space;+&space;\alpha_k&space;\phi_k(x).$

More precisely, the functions $\inline&space;\phi_0,&space;\phi_1,&space;\ldots\$ should be polynomial, listed in increasing order of their degree and also with the property that

$\int_{x_1}^{x_n}&space;\phi_i(x)&space;\phi_j(x)&space;w(x)\&space;{\rm&space;d}x&space;=&space;0.$

for any $\inline&space;i&space;\neq&space;j$, where $\inline&space;w&space;:&space;[x_1,&space;x_n]&space;\to&space;\mathbb{R}$ is some weight function. In other words, the sequence of polynomial functions $\inline&space;\phi_0,&space;\phi_1,&space;\ldots\$ is orthogonal.

Apart from this, the regression using orthogonal polynomials uses the same idea of minimizing the sum of squared residuals. It is also implemented as the component Regression/Orthogonal.

Below is a graph of a series of data points, shown in blue, and their corresponding fitting function, determined by the method of orthogonal polynomial regression.

We may as well use certain types of orthogonal polynomials, instead of general ones as in the above method. For instance, we can use the Forsythe orthogonal polynomials, leading to the method of Forsythe polynomial regression, available as Regression/Forsythe.

In the image below, we have used Forsythe regression on the same series of data points as in the previous graph.

## Nonlinear Regression

As mentioned at the beginning of this reference page, nonlinear regression refers to the cases when the fitting function depends nonlinearly on its parameters. Therefore $\inline&space;f$ can virtually be any function with this property. Depending on the experiment from which the data points were obtained and the statistical properties of the phenomenon which took place, we may choose between various families of nonlinear regression functions.

To name a few nonlinear regression models, logistic regression is given by the fitting function

$f_{\displaystyle&space;\alpha_1,&space;\alpha_2}(x)&space;=&space;\frac{1}{1&space;+&space;e^{\displaystyle&space;-\alpha_1&space;-&space;\alpha_2&space;x}}$

and is also implemented as Regression/Logistic. In the following image you can see the logistic regression curve in red, for the series of data points shown in blue.

Other nonlinear regression models include the Gompertz model, given by the fitting function

$f_{\displaystyle&space;\alpha_1,&space;\alpha_2,&space;\alpha_3}(x)&space;=&space;\alpha_1&space;e^{\displaystyle&space;-e^{-\alpha_2&space;-&space;\alpha_3&space;x}},$

or Richard's model, given by:

$f_{\displaystyle&space;\alpha_1,&space;\alpha_2,&space;\alpha_3,&space;\alpha_4}(x)&space;=&space;&space;&space;\frac{\alpha_1}{\left[1&space;+&space;e^{\displaystyle&space;-\alpha_2&space;-&space;\alpha_3&space;x}\right]^{\alpha_4}}.$
Lucian Bentea (September 2008)