Use N to know how many observations are in your sample. Minitab does not include missing values in this count.
You should collect a medium to large sample of data. Samples that have at least 20 observations are often adequate to represent the distribution of your data. However, to better represent the distribution with a histogram, some practitioners recommend that you have at least 50 observations. Larger samples also provide more precise estimates of the process parameters, such as the mean and standard deviation.
Step 2: Describe the center of your data
Use the mean to describe the sample with a single value that represents the center of the data. Many statistical analyses use the mean as a standard measure of the center of the distribution of the data.
The median is another measure of the center of the distribution of the data. The median is usually less influenced by outliers than the mean. Half the data values are greater than the median value, and half the data values are less than the median value.
The median and the mean both measure central tendency. But unusual values, called outliers, can affect the median less than they affect the mean. If your data are symmetric, the mean and median are similar.
Step 3: Describe the spread of your data
Use the standard deviation to determine how spread out the data are from the mean. A higher standard deviation value indicates greater spread in the data.
Step 4: Assess the shape and spread of your data distribution
Use the histogram, the individual value plot, and the boxplot to assess the shape and spread of the data, and to identify any potential outliers.
Examine the spread of your data to determine whether your data appear to be skewed
When data are skewed, the majority of the data are located on the high or low side of the graph. Often, skewness is easiest to detect with a histogram or boxplot.
Determine how much your data varies
Assess the spread of the points to determine how much your sample varies. The greater the variation in the sample, the more the points will be spread out from the center of the data.
Look for multi-modal data
Multi-modal data have multiple peaks, also called modes. Multi-modal data often indicate that important variables are not yet accounted for.
If you have additional information that allows you to classify the observations into groups, you can create a group variable with this information. Then, you can create the graph with groups to determine whether the group variable accounts for the peaks in the data.
Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.
Try to identify the cause of any outliers. Correct any data–entry errors or measurement errors. Consider removing data values for abnormal, one-time events (also called special causes). Then, repeat the analysis. For more information, go to Identifying outliers.
Step 5. Compare data from different groups
If you have a By variable that identifies groups in your data, you can use it to analyze your data by group or by group level.