# Normal distribution

## Overview

The fundamental importance of the normal distribution as a model of quantitative phenomena in the natural and behavioral sciences is due to the central limit theorem. A variety of psychological test scores and physical phenomena like photon counts can be well approximated by a normal distribution. While the mechanisms underlying these phenomena are often unknown, the use of the normal model can be theoretically justified if one assumes many small (independent) effects contribute to each observation in an additive fashion. This is also the reason the normal distribution is widely used in engineering applications to model noise.

The normal distribution also arises in many areas of statistics: for example, the sampling distribution of the mean is approximately normal, even if the distribution of the population the sample is taken from is not normal.

In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance.

The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the convergence of limiting distributions of several continuous and discrete families of distributions. ## History

The normal distribution was first introduced by Abraham de Moivre in an article in 1734 in the context of approximating certain binomial distributions for large n. His result was extended by Pierre Simon de Laplace in his book Analytical Theory of Probabilities ( 1812 ), and is now called the theorem of de Moivre-Laplace .

Laplace used the normal distribution in the analysis of errors of experiments. The important method of least squares was introduced by Adrien Marie Legendre in 1805. Carl Friedrich Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors.

The name bell curve goes back to Jouffret who first used the term "bell surface" in 1872 for a bivariate normal with independent components. The name "normal distribution" was coined around 1875. This terminology is unfortunate, since it reflects and encourages the fallacy that many or all other probability distributions are not "normal".

That the distribution is called the Gaussian distribution is an instance of Stigler's law of eponymy : "No scientific discovery is named after its original discoverer."

## Characterization of the normal distribution

There are various ways to characterize a probability distribution. The most visual is the probability density function (plot at the top), which (roughly) gives the relative probability of each value.

The cumulative distribution function is an alternate way to specify the same information (see below).

Some notable qualities of the normal distribution:

• The density function is symmetric about its mean value.
• The mean is also its mode and median.
• 68.26894921371% of the area under the curve is within one standard deviation of the mean.
• 95.44997361036% of the area is within two standard deviations.
• 99.73002039367% of the area is within three standard deviations.
• 99.99366575163% of the area is within four standard deviations.
• 99.99994266969% of the area is within five standard deviations.
• 99.99999980268% of the area is within six standard deviations.
• 99.99999999974% of the area is within seven standard deviations.

## The central limit theorem

The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the central limit theorem.

The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions.

• A binomial distribution with parameters n and p is approximately normal for large n and p not too close to 1 or 0 (Some books recommend using this approximation only if n p and n(1 - p) are both at least 5; in this case, a continuity correction should be applied).

The approximating normal distribution has mean $\mu = n p$ and variance $\sigma^2 = n p (1 - p)$.

The approximating normal distribution has mean mu = lambda and variance sigma^2 = lambda.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.