Eigenvalues (also called characteristic values or latent roots) are the variances of the principal components.
You can use the size of the eigenvalue to determine the number of principal components. Retain the principal components with the largest eigenvalues. For example, using the Kaiser criterion, you use only the principal components with eigenvalues that are greater than 1.
To visually compare the size of the eigenvalues, use the scree plot. The scree plot can help you determine the number of components based on the size of the eigenvalues.
Proportion is the proportion of the variability in the data that each principal component explains.
You can use the proportion to determine which principal components explain most of the variability in the data. The higher the proportion, the more variability that the principal component explains. The size of the proportion can help you decide whether the principal component is important enough to retain.
For example, a principal component with a proportion of 0.621 explains 62.1% of the variability in the data. Therefore, this component is important to include. Another component has a proportion of 0.005, and thus explains only 0.5% of the variability in the data. This component may not be important enough to include.
Cumulative is the cumulative proportion of the sample variability explained by consecutive principal components.
Use the cumulative proportion to assess the total amount of variance that the consecutive principal components explain. The cumulative proportion can help you determine the number of principal components to use. Retain the principal components that explain an acceptable level of variance. The acceptable level depends on your application.
For example, you may only need 80% of the variance explained by the principal components if you are only using them for descriptive purposes. However, if you want to perform other analyses on the data, you may want to have at least 90% of the variance explained by the principal components.
Principal components (PC)
The principal components are the linear combinations of the original variables that account for the variance in the data. The maximum number of components extracted always equals the number of variables. The eigenvectors, which are comprised of coefficients corresponding to each variable, are used to calculate the principal component scores. The coefficients indicate the relative weight of each variable in the component.
If you use the correlation matrix, you must standardize the variables to obtain the correct component score.
To interpret each principal component, examine the magnitude and the direction of coefficients of the original variables. The larger the absolute value of the coefficient, the more important the corresponding variable is in calculating the component. How large the absolute value of a coefficient has to be in order to deem it important is subjective. Use your specialized knowledge to determine at what level the correlation value is important.
Scores are linear combinations of the data that are determined by the coefficients for each principal component. To obtain the score for an observation, substitute its values in the linear equation for the principal component. If you use the correlation matrix, you must standardize the variables to obtain the correct component score when using the linear equation.
To obtain the calculated score for each observation, click Storage and enter a column to store the scores in the worksheet when you perform the analysis. To visually display the scores for the first and second components on a graph, click Graphs and select the score plot when you perform the analysis.
Mahalanobis distance is the distance between a data point and the centroid of the multivariate space (the overall mean).
To calculate the distance for each observation, click Storage and enter a column in the worksheet to store the distances when you perform the analysis. To display the distances on a graph, click Graphs and select the outlier plot when you perform the analysis.
Use the Mahalanobis distance to identify outliers. Examining the Mahalanobis distance is a more powerful multivariate method for detecting outliers than examining one variable at a time because the distance considers the different scales between variables and the correlations between them.
To assess whether a distance value is large enough for the observation to be considered an outlier, use the outlier plot.
The scree plot displays the number of the principal component versus its corresponding eigenvalue. The scree plot orders the eigenvalues from largest to smallest. The eigenvalues of the correlation matrix equal the variances of the principal components.
To display the scree plot, click Graphs and select the scree plot when you perform the analysis.
Use the scree plot to select the number of components to use based on the size of the eigenvalues. The ideal pattern is a steep curve, followed by a bend, and then a straight line. Use the components in the steep curve before the first point that starts the line trend.
The score plot graphs the scores of the second principal component versus the scores of the first principal component.
To display the score plot, click Graphs and select the score plot when you perform the analysis.
If the first two components account for most of the variance in the data, you can use the score plot to assess the data structure and detect clusters, outliers, and trends. Groupings of data on the plot may indicate two or more separate distributions in the data. If the data follow a normal distribution and no outliers are present, the points are randomly distributed around zero.
To see the calculated score for each observation, hold your pointer over a data point on the graph. To create score plots for other components, store the scores and use Graph > Scatterplot.
The loading plot graphs the coefficients of each variable for the first component versus the coefficients for the second component.
To display the loading plot, click Graphs and select the loading plot when you perform the analysis.
Use the loading plot to identify which variables have the largest effect on each component. Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component. Evaluating the loadings can also help you characterize each component in terms of the variables.
The biplot overlays the score plot and the loading plot.
To display the biplot, click Graphs and select the biplot when you perform the analysis.
Use the biplot to assess the data structure and the loadings of the first two components on one graph. Minitab plots the second principal component scores versus the first principal component scores, as well as the loadings for both components.
The outlier plot displays the Mahalanobis distance for each observation and a reference line to identify outliers. The Mahalanobis distance is the distance between each data point and the centroid of multivariate space (the overall mean). Examining Mahalanobis distances is a more powerful method for detecting outliers than looking at one variable at a time because it considers the different scales between variables and the correlations between them.
To display the outlier plot, you must click Graphs and select the outlier plot when you perform the analysis.
Use the outlier plot to identify outliers. Any point that is above the reference line is an outlier.
Outliers can significantly affect the results of your analysis. Therefore, if you identify an outlier in your data, you should examine the observation to understand why it is unusual. Correct any measurement or data entry errors. Consider removing data that are associated with special causes and repeating the analysis.
Hold your pointer over any point on an outlier plot to identify the observation. Use Editor > Brush to brush multiple outliers on the plot and flag the observations in the worksheet.