Descriptive Statistics
Welcome to the Purdue OWL
This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.
Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.
The mean, the mode, the median, the range, and the standard deviation are all examples of descriptive statistics. Descriptive statistics are used because in most cases, it isn't possible to present all of your data in any form that your reader will be able to quickly interpret.
Generally, when writing descriptive statistics, you want to present at least one form of central tendency (or average), that is, either the mean, median, or mode. In addition, you should present one form of variability, usually the standard deviation.
Measures of Central Tendency and Other Commonly Used Descriptive Statistics
The mean, median, and the mode are all measures of central tendency. They attempt to describe what the typical data point might look like. In essence, they are all different forms of 'the average.' When writing statistics, you never want to say 'average' because it is difficult, if not impossible, for your reader to understand if you are referring to the mean, the median, or the mode.
The Mean
The mean is the most common form of central tendency, and is what most people usually are referring to when the say average. It is simply the total sum of all the numbers in a data set, divided by the total number of data points. For example, the following data set has a mean of 4: {-1, 0, 1, 16}. That is, 16 divided by 4 is 4. If there isn't a good reason to use one of the other forms of central tendency, then you should use the mean to describe the central tendency.
The Median
The median is simply the middle value of a data set. In order to calculate the median, all values in the data set need to be ordered, from either highest to lowest, or vice versa. If there are an odd number of values in a data set, then the median is easy to calculate. If there is an even number of values in a data set, then the calculation becomes more difficult. Statisticians still debate how to properly calculate a median when there is an even number of values, but for most purposes, it is appropriate to simply take the mean of the two middle values. The median is useful when describing data sets that are skewed or have extreme values. Incomes of baseballs players, for example, are commonly reported using a median because a small minority of baseball players makes a lot of money, while most players make more modest amounts. The median is less influenced by extreme scores than the mean.
The Mode
The mode is the most commonly occurring number in the data set. The mode is best used when you want to indicate the most common response or item in a data set. For example, if you wanted to predict the score of the next football game, you may want to know what the most common score is for the visiting team, but having an average score of 15.3 won't help you if it is impossible to score 15.3 points. Likewise, a median score may not be very informative either, if you are interested in what score is most likely.
Standard Deviation
The standard deviation is a measure of variability (it is not a measure of central tendency). Conceptually it is best viewed as the 'average distance that individual data points are from the mean.' Data sets that are highly clustered around the mean have lower standard deviations than data sets that are spread out.
For example, the first data set would have a higher standard deviation than the second data set:
Notice that both groups have the same mean (5) and median (also 5), but the two groups contain different numbers and are organized much differently. This organization of a data set is often referred to as a distribution. Because the two data sets above have the same mean and median, but different standard deviation, we know that they also have different distributions. Understanding the distribution of a data set helps us understand how the data behave.