Ways to assess a data distribution

A collection of data can be distributed or spread out in many different ways. For example, data from rolling a dice can have random integer values from 1 through 6. Data from a manufacturing process may be centered on a target value or may include data values that are very far from the center value.

You can assess a data distribution through graphs, descriptive statistics, or comparison to a theoretical distribution:
Graphs like histograms can give instant insight into the distribution of a data set. Histograms can help you to observe:
  • Whether the data cluster around a single value or whether the data have multiple peaks or modes.
  • Whether the data are spread thinly over a large range or whether the data are within a small range.
  • Whether the data are skewed or symmetrical.
Descriptive statistics
Descriptive statistics that describe the central tendency (mean, median) and spread (variance, standard deviation) of data with numeric values add a layer of detail and can be used to make comparisons with other data sets.
Theoretical distributions
Finally, some common distributions can be identified and are referred to by name, like the normal, Weibull, and exponential distributions. The normal distribution, for example, is always bell-shaped and symmetric about a mean value.
Your real data will likely only approximate these perfect distributions. If there is a close fit, you can say that the data are well-modeled by a given distribution. Use Stat > Quality Tools > Individual Distribution Identification to identify the distribution that best fits your data.