Regression Report, Graph and Residual Plot
See Also: Linear & Polynomial Regression Multiple Linear Regression
Example 1, using logP as dependent and Trec, logT and T2 as independent variables, employing multiple linear regression is used to show the various information presented in the regression report, graph and residual plot windows. The first part of the report is shown below.
POLYMATH Report Fitting Polynomials to vapor pressure data Nonlinear Regression Model: logP = a0 + a1*Trec + a2*logT + a3*T2
Variable Initial guess Value 95% confidence a0 200. 216.7214 156.4115 a1 -9000. -9318.66 4857.135 a2 -100. -75.74818 58.42643 a3 0 4.445E-05 5.001E-05
In this section the title of the problem as entered by the user, the date when the solution was obtained, the regression model and its parameter values (including 95% confidence intervals) are presented. In order for the regression results to be statistically valid the confidence intervals must be much smaller (or at least smaller) than the respective parameter values (in absolute values). This subject is further discussed in the Assessing the quality of regression models section. Additional information shown in the regression report is the following:
Number of independent variables = 3
Regression including free parameter
Number of observations = 10
R^2 = 0.9997514
R^2adj = 0.9996271
Rmsd = 0.0042151
Variance = 2.961E-04
The statistical indicators presented in this part of the report include R^ 2: the correlation coefficient (or coefficient of the multiple determination), R^ 2adj: the adjusted coefficient of the multiple determination, Rmsd: Root mean square deviation and the variance. These indicators measure the deviation between the calculated values and the data for the dependent variable, and they can be used for comparing various models representing the same dependent variable. Model A, for example, represents the data better than model B if the values of R^2 and R^2adj are closer to one and the values of Rmsd and the variance are smaller for model A than for model B. Note that models where the independent variables are different (P versus log(P), for example) cannot be compared directly. This subject is further discussed in the Assessing the quality of regression models section.
If the Graph option is turned on in the Regression solver dialog box, the following graph is obtained for the Example problem:

In this plot the dependent variable values (log (P)) values are shown, the data values are marked by small squares and the calculated values by small circles. In this particular case the differences between the calculated and experimental values are very small, almost indiscernible with the particular graph scaling being used. The data points are connected to help to see the differences but in order to really appreciate the quality of the fit the residual plot that follows should be consulted. Note that the text Title in the graph can be erased or replaced by real title and a subtitle can also be added. These and additional graph editing options are discussed in the Graph editing options section.
The residual plot for the Example is shown below.

The residual plot shows the difference between the calculated and measured values of the dependent variable as function of the measured values. If the regression model represents the data correctly, the residuals are randomly distributed around the line of err=0 with zero mean. In the residual plot shown here no clear trend can be identified the number of data points is, however, insufficient for verifying the randomness of the error distribution.