Nonlinear Regression

See Also: Linear & Polynomial Regression  Multiple Linear Regression  Data Table  Regression Report 


Nonlinear regression involves a general mathematical function (model) of the form:

y = f (x1, x2, …, xn, a0, a1, a2, …, am)

where a0, a1, …, an are regression parameters to a set of N tabulated values of x1, x2, …, xn (independent variables) versus y (dependent variable). Note that the number of data points must be greater than m + 1 (thus N >= m + 1). The Nonlinear Regression capability is reached from the Polymath Data Table by first clicking on the lower tab marked "Regression" and then clicking on the upper tab marked "Nonlinear". The window for entering a nonlinear model is shown below.


 

Model:
Type in a new regression model equation or edit an existing model equation.

Enter Initial Guess for Model Parameters:
After entering the model, an initial guess for all the parameters should be provided in this area.

Graph: 
If this option is marked, a graph showing the calculated points and the data points is prepared and displayed.

Residuals: 
If this option is marked, a graph is displayed showing the deviation between the individual data points and the corresponding values as calculated from the nonlinear model.  Additionally, the residual plot is presented  (difference between the actual and the calculated dependent variable for the entire data set).. 

Store Model in Column: 
This option creates a new column in the Data Table using the converged nonlinear regression model and the calculated parameters.

Report: 
If this option is marked, a report showing the regression model the numerical values and confidence intervals of the parameters and some additional statistical and other information is presented and displayed.
Solve ():
Clicking on this arrow will carry out the regression and generate the requested outputs. 

 

Solution Details: 

The nonlinear expression must be entered in the entry box. The left-hand side of the model equation must be the name of the dependent variable, and it is required that this name must be one on the names of an existing data column in the Data Table.  After the "=" sign, an explicit expression must be entered that uses column names as variables.  Additional variables that are used become parameters for the regression. The model (nonlinear expression) can utilize all the intrinsic functions available within Polymath. For description of valid names and expressions see the Variables and Expressions.  Once the model has been entered with the appropriate syntax, the parameters window becomes active.  Initial estimates are required for all the model parameters. For moderately nonlinear models, the program will find the best parameter values even though the initial estimates may be poor. For highly nonlinear models, good initial estimates are required. Initial estimates should always be realistic for the physical phenomena that are being described.  It is very important that the calculated model values for the data set using the initial estimates are reasonable and of the same order of magnitude when compared to the dependent variable values in the data set. For difficult situations, it may be useful to linearize or simplify the model and use the multiple linear regression option to determine parameter values that can then be used as the initial estimates for the nonlinear regression. See the following section for a detailed discussion of techniques for linearization of nonlinear models. 

The program uses the Levenberg-Marquardt (LM) algorithm as the default for finding the parameter values.  The objective function that is minimized is the ithe sum of squares of the errors. The error is the difference between the actual value of the dependent variable and the calculated value of the dependent variable from the model expression.  A detailed explanation of this method can be found, for example, in the book by Press et al. Two different implementations of the LM method are included. The LM technique is an iterative solution method that usually converges very rapidly, except when the Hessian matrix becomes nearly singular. In such cases, the algorithm switches to the steepest descent method, the convergence of which can be very slow. A nearly singular Hessian matrix often indicates that there are more parameters in the model than are justified by the data. In case of slow convergence, it is recommended that you stop the iterations and check the display of statistical analysis to verify the correctness of the number of model parameters. 

If there are more parameters than really needed, the 95% confidence interval for most of the parameters will tend to be much larger than the parameter value itself. For converged results, the graphical and statistical information provided can be used for assessing the quality of the fit. For further details see Assessing the Quality of Regression Models.

Example:  Calculation of the Parameters of the Antoine Equation

Consider the data that appear as Example 2 in the Examples drop-down menu in the Polymath "Data Table" window. The Antoine equation given below is to be determined from the given data points.  The Antoine equation is of the form

logP = A + B/(TC + C)

where logP = log(P) and TC = T °C are columns in the Data Table and A, B, and C are the model parameters.

Initial estimates for the parameters given by A = 6, B = -1000, and C = 200 are used in this example.

From the Data Table window, a mouse click on the bottom "Regression" tab followed by a mouse click on the upper "Nonlinear" tab displays the nonlinear regression window.  This window is shown below where the model and parameter initial guesses for this example are shown.

Note that the requested output of "Graph", "Residuals", and "Report" must be checked on the right side of this window.  A simple mouse click on the pick arrow () will initiate the solution to this nonlinear regression problem.

EXAMPLE REPORT

POLYMATH Report Fitting Polynomials to vapor pressure data
Nonlinear Regression (L-M)  


Model: logP = A+B/(TC+C)

Variable Initial guess Value 95% confidence
A 6. 5.767347 0.1520845
B -1000. -677.094 48.15908
C 200. 153.8854 5.687091

Nonlinear regression settings
Max # iterations = 64

Precision

R^2 0.9996879
R^2adj 0.9995987
Rmsd 0.0047228
Variance 0.0003186

General

Sample size 10
Model vars 3
Indep vars 1
Iterations 11

Source data points and calculated data points

  TC logP logP calc Delta logP
1 -36.7 0 -0.0106273 0.0106273
2 -19.6 0.69897 0.7251442 -0.0261742
3 -11.5 1 1.011984 -0.0119843
4 -2.6 1.30103 1.291739 0.0092914
5 7.6 1.60206 1.574434 0.0276258
6 15.4 1.7781513 1.767627 0.0105243
7 26.1 2 2.005407 -0.0054074
8 42.2 2.30103 2.314289 -0.0132593
9 60.6 2.60206 2.610516 -0.0084558
10 80.1 2.8808136 2.873601 0.0072121

EXAMPLE GRAPH

EXAMPLE NONLINEAR REGRESSION GRAPH

EXAMPLE RESIDUALS



DETERMINATION OF INITIAL GUESSES

Successful convergence of a nonlinear regression model is often dependent upon the use of good initial guesses for the model parameters.  Good initial parameter guesses can typically be obtained from a linearization of the nonlinear expression so that linear regression or multiple linear regression can utilized.  In the above example, the denominator of the nonlinear regression model can be approximated by setting the parameter "C" equal to 273 which is equivalent to using the temperature in Kelvin.  The linear regression for this approximation yields
 

Model: logP = a0 + a1*Trec

Variable Value 95% confidence
a0 8.752017 0.5423357
a1 -2035.333 153.6285

Use of the initial guesses of  A = 8.75, B = -2035 and C = 273 in the nonlinear regression model gives identical results to the above Example Report.

INTERPRETATION OF RESULTS

For further details see Assessing the Quality of Regression Models.