A collection of data can be distributed or spread out in many different ways. For example, data from rolling a dice can have random integer values from 1 through 6. Data from a manufacturing process may be centered on a target value or may include data values that are very far from the center value.
You can assess a data distribution through graphs, descriptive statistics, or comparison to a theoretical distribution:
- Graphs
- Graphs like histograms can give instant insight into the distribution of a data set. Histograms can help you to observe:
- Whether the data cluster around a single value or whether the data have multiple peaks or modes.
- Whether the data are spread thinly over a large range or whether the data are within a small range.
- Whether the data are skewed or symmetrical.
- Descriptive statistics
- Descriptive statistics that describe the central tendency (mean, median) and spread (variance, standard deviation) of data with numeric values add a layer of detail and can be used to make comparisons with other data sets.
- Theoretical distributions
- Finally, some common distributions can be identified and are referred to by name, like the normal, Weibull, and exponential distributions. The normal distribution, for example, is always bell-shaped and symmetric about a mean value.
- Your real data will likely only approximate these perfect distributions. If there is a close fit, you can say that the data are well-modeled by a given distribution. Use to identify the distribution that best fits your data.