Data Table

See Also: Linear & Polynomial Regression  Multiple Linear Regression  Nonlinear Regression


Expand All | Collapse All

Overview

All of the Polymath regression programs begin with the data table that is used for input, manipulation and storage of numerical data. The data is stored in a column-wise fashion where every column is associated with a name (variable) and can be addressed separately. The stored data can be regressed (fitting a straight line, various curves and equations to the data using linear, multiple linear, polynomial, and nonlinear regression techniques), analyzed ( interpolated, differentiated, integrated along with various statistical calculations) and plotted. The data table can be accessed directly by selecting the "REG" option when creating a new data file. Tabular results of solution of a set of ordinary differential equations are also stored  in the data table. A data table with two columns of data is shown below:


Entering and Manipulating Data

The Polymath Data Table for data entry and manipulation is shown above. It is much like a spreadsheet. The top row of cells contains the column names (the default names being C01 to C30). The rest are data cells, which have 300 rows by 30 columns (default values, the total number of rows and columns can be changed in the setup dialog window). 

The address of the current active cell is shown in the upper left corner of the data table. The active cell is also highlighted. You can move the highlight by using the arrow keys and the mouse. 

To enter or modify a value in a data cell, move the highlight to the appropriate cell and start typing in a number. When you are done typing in the number, you may press the left or right arrow key to move to the next cell in the current row.  Also the up or down arrow keys may be used to move to the next cell above or below. 

When entering the data it is important to place data in all rows. Leaving rows empty (in particular the first row)  may prevent  POLYMATH from  identifying the columns in regression and/or analysis.

Column names can be changed by using the change “Column Name” in the drop-down "Column" menu which is shown on the left. A name must begin with a letter, and may contain letters and digits. See the section Variables and Expressions for additional column naming rules.

Contents of a column can be defined as function of previously defined columns. Column definitions can be very useful when, for example, transformation functions are used to linearize a nonlinear regression model . To define such a column, select “Column definition” from the drop-down “Column” menu. Columns must be previously named before they can be used in an expression.

You may alternatively set or change the column name in the left side of the “=” sign and enter the column definition in the form of an expression where the expression may contain constants and names of previously defined columns.

For example, consider the following transformation of variables:  logTempK=log(Temp+273.15) The values in a column defined in this manner are computed whenever the values in any of the columns involved in the expression are modified. You can not modify numbers in data cells of a defined column directly, unless the definition is deleted. Furthermore, the options 'clean', 'cut' and 'paste' are not active for the data tale cells that are generated by a column definition.

Note that the column manipulation options can be reached (in addition to using the dropdown menu) by marking a column and pressing the right mouse key. Plotting or regression of the data from the Data Table is by selecting the Regression tab at the bottom of the Data Table and then selecting the option at the top (Linear & Polynomial, Multiple Linear, or Nonlinear). The Linear & Polynomial Tab is shown in the Data Table given below:


Example

The following table presents vapor pressure versus temperature for benzene. (This is Example 2 - "Vapor pressure" in the Polymath REG Regression Program)

The appropriate columns (variables) must be created in order to fit the equation below:


        log(P) = A + B/T + C log(T) + D T2 

or      log(P) = A + B*(1/T) +C*(log(T)) +D*(T2)

which has the general form for multiple linear regression that is given by

    y(x1, x2, ..., xn) = a0 + a1*x1 + a2*x2 + ... + an*xn 

Note that T is the temperature in Kelvin and A, B, C and D are the parameters.  For this problem, log(P) corresponds to  y(x1, x2, ..., xn), A corresponds to a0, B corresponds to a1, (1/T) corresponds to x1, C corresponds to a2, (log(T)) corresponds to x2, D corresponds to a3, and (T2) corresponds to x3.

The Data Table must be prepared for multiple linear regression by creating columns containing the variables (1/T), (log(T)), and (T2). 

Step 1

Enter the above data into the Data Table with descriptive column names and without the row of unit information.  This is shown in the Data Table below as columns "TC" and "P". 

Step 2

Create a column named "TK" for the temperature in Kelvin using column "TC".  The is indicated in the Data Table below where the desired column has been selected and both the descriptive name and the column definition have both been entered in their respective textboxes.  Remember that

Column formula definition must start with an equals '=' sign, and calculation is carried out for the entire column overriding previous values.

Press the button or press enter to finish entering the TK column name and column definition.

Step 3

The column “logP” which contains the values of log(P) can be named and column defined in a similar manner.  Note that the log() function is used here.

Additional Steps

Three additional columns can be created where Trec = 1/TK, logT = log(TK) and T2 = TK^2.

The completed Data Table is shown below, after all the necessary columns have been defined.

Note that the names of the columns which were defined as functions of other columns appear in blue bold color.

Shortcuts


Transformation Functions

Often it is more convenient to fit a curve, a straight line or a linear function to a transformed form of the original data than to carry out nonlinear regression with the original nonlinear regression model.

For example, consider a data set that satisfies the relationship y = ax^b .

When logarithms to the base 10 are taken of both sides of the above equation, the result is log(y) = log(a) + b*log(x).  If transformation functions xt = log(x) and yt = log(y) are defined, then the transformed data points will give a straight line when yt is plotted versus xt, with slope = b and intercept = log(a). The column definition option in Polymath can be conveniently used to create the columns of transformed variables. The Polymath "Linear and Polynomial Regression Option" can be used to determine b and log(a).

Some additional useful transformations:


General Notes
  1. The total number of rows and columns can be modified from the Polymath setup window. The changes will take place for new Table windows only.
  2. Table formats (Fonts, Colors, and Alignments) are not saved when saving the problem into data file. Only the contents of the cells and the formulas are saved.