Interpret all statistics for Scatterplot

Find definitions and interpretation guidance for every statistic that is provided with the scatterplot.

N

The sample size (N) is the number of plotted points for a pair of variables. N does not include rows for which either the X-value or the Y-value are missing.

Mean

The mean is the average of the data, which is the sum of all the observations divided by the number of observations.

For example, the wait times (in minutes) of five customers in a bank are: 3, 2, 4, 1, and 2. The mean waiting time is calculated as follows: On average, a customer waits 2.4 minutes for service at the bank.

Interpretation

Use the mean to describe the sample with a single value that represents the center of the data. Many statistical analyses use the mean as a standard measure of the center of the distribution of the data.

StDev

The standard deviation is the most common measure of dispersion, or how spread out the data are about the mean.

Interpretation

For a normal distribution, approximately 68% of the values fall within one standard deviation of the mean, 95% of the values fall within two standard deviations, and 99.7% of the values fall within three standard deviations.

Minimum

The minimum is the smallest data value.

In these data, the minimum is 7.

 13 17 18 19 12 10 7 9 14

Interpretation

Use the minimum to identify a possible outlier or a data-entry error. One of the simplest ways to assess the spread of your data is to compare the minimum and maximum. If the minimum value is very low, even when you consider the center, the spread, and the shape of the data, investigate the cause of the extreme value.

Maximum

The maximum is the largest data value.

In these data, the maximum is 19.

 13 17 18 19 12 10 7 9 14

Interpretation

Use the maximum to identify a possible outlier or a data-entry error. One of the simplest ways to assess the spread of your data is to compare the minimum and maximum. If the maximum value is very high, even when you consider the center, the spread, and the shape of the data, investigate the cause of the extreme value.

R-sq

R2 is the percentage of variation in the response that is explained by the model.

Interpretation

Use R2 to determine how well the model fits your data. The higher the R2 value, the better the model fits your data. R2 is always between 0% and 100%.

The first plot illustrates a simple regression model that explains 85.5% of the variance in the response. The second plot illustrates a model that explains 22.6% of the variance in the response. The more variance that is explained by the model, the closer the data points fall to the fitted regression line.

Equation

The equation describes the relationship between the Y variable and the X variable.

For example, a company determines that scores on a job skills test can be predicted using a linear model with the following equation: Score = 130 + 4.3 Training Hours. The intercept is 130, which indicates the average score for an employee with zero hours of training. The coefficient is 4.3, which indicates that, for each hour of training, the employee's score increases, on average, by 4.3 points.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy