Data considerations for Nonlinear Regression

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The predictors can be continuous or categorical

A continuous variable can be measured and ordered, and has an infinite number of values between any two values. For example, the diameters of a sample of tires is a continuous variable.

Categorical variables contain a finite, countable number of categories or distinct groups. Categorical data might not have a logical order. For example, categorical predictors include gender, material type, and payment method.

If you have a discrete variable, you can decide whether to treat it as a continuous or categorical predictor. A discrete variable can be measured and ordered but it has a countable number of values. For example, the number of people that live in a household is a discrete variable. The decision to treat a discrete variable as continuous or categorical depends on the number of levels, as well as the purpose of the analysis.

If you have any categorical predictors, convert the categorical predictors to indicator variables before you perform this analysis. To convert the categorical predictors, use Make Indicator Variables.

The response variable should be continuous

If the response variable is categorical, your model is less likely to meet the assumptions of the analysis, to accurately describe your data, or to make useful predictions.

If your data do not require a nonlinear function, you can consider the following alternative analyses.

  • If your response variable has two categories, such as pass and fail, use Fit Binary Logistic Model.
  • If your response variable contains three or more categories that have a natural order, such as strongly disagree, disagree, neutral, agree, and strongly agree, use Ordinal Logistic Regression.
  • If your response variable contains three or more categories that do not have a natural order, such as scratch, dent, and tear, use Nominal Logistic Regression.
  • If your response variable counts occurrences, such as the number of defects, use Fit Poisson Model.
The expectation function must accurately describe the relationship between response and predictor variables
Your choice for the expectation function often depends on prior knowledge about the response curve's shape or the behavior of physical and chemical properties in the system. Potential nonlinear shapes include concave, convex, exponential growth or decay, sigmoidal (S), and asymptotic curves. You need to specify the function that satisfies both the requirements of your prior knowledge and the residual plots.
You must specify acceptable starting values
An iterative algorithm estimates parameters by systematically adjusting the parameter estimates to reduce the sum of squared errors (SSE). For some expectation functions and data sets, the starting values can significantly affect the results.
Collect data using best practices
To ensure that your results are valid, consider the following guidelines:
  • Make certain that the data represent the population of interest.
  • Collect enough data to provide the necessary precision.
  • Measure variables as accurately and precisely as possible.
  • Record the data in the order it is collected.
The model should provide a good fit to the data

If the model does not fit the data, the results can be misleading. In the output, use residual plots and model summary statistics to determine how well the model fits the data.