Biostatistics: Why Is Standard Deviation Important?

Why Is Standard Deviation Important? Standard deviation is a useful measure in that it tells you right away whether data are widely dispersed or tightly clustered around the mean. As you have just seen, the standard deviation also provides an easy way to describe the distance between any particular point in the data and the mean of the data. For most data sets, the majority of observations will fall within one standard deviation of the mean. As a rule of thumb, observations that lie more than two standard deviations away from the mean are considered “far“ from the mean. Knowing this allows statisticians to understand their data better. For example, if someone told you that German cyclist Max Walscheid weighs 202 lbs, you might not have a good sense of how his weight compares to other cyclists unless you know a lot about cycling. However, knowing that his weight is 3.4 σ σ above the mean gives you an immediate understanding that Walscheid’s weight is far above the average weight of his competitors. He is an outlier. Standard Deviation and the Empirical Rule (68-95-99.7) Standard deviations are particularly useful when it comes to describing data that are normally distributed. A normal distribution, if you are not already familiar, is a bell-shaped distribution that is unimodal and symmetric about the mean. Because the Normal distribution is symmetric about the mean, we can make precise statements about the proportion of observations that lie within certain segments of the distribution. We do this using the empirical rule. The empirical rule (or the 68-95-99.7 rule) states that: 68% of all observations in a normal distribution lie within one standard deviation of the mean (μ ± σ) 95% of all observations lie within two standard deviations of the mean (μ ± 2σ) And 99.7% of all observations lie within three standard deviations of the mean (μ ± 3σ) In other words, most observations in a normal distribution lie within one standard deviation of the mean, and hardly any of the observations lie beyond three standard deviations of the mean. Observations that are further than σ±3 σ are just 0.3% of the observations (100% - 99.7%). The empirical rule can be applied even when data is approximately normal. Because a normal distribution can approximate so many different types of data, the empirical rule comes in quite handy! Using the empirical rule, we know that 68% of observations in a Normal distribution lie within 1 σ of the mean, 95% of observations are within 2σ of the mean, and 99.7% of observations are within 3 of the mean. Because a Normal distribution is symmetric about the mean, we can further divide the areas under a Normal distribution to find probabilities for smaller segments of the distribution (see the figure below). For example, we can further state that 34% of observations are between the mean and one standard deviation above the mean, or that just % of data are between (μ - 3σ) and (μ - 2σ).

1 view

196