# Moments

## Moments Topics

Sort by:

### Polykay

The symmetric statistic defined such that(1)where is a cumulant. These statistics generalize k-statistic and were originally called "generalized -statistics" (Dressel 1940). The term "polykay" was introduced by Tukey (1956; Rose and Smith 2002, p. 255). Polykays are commonly defined in terms of power sums, for example(2)(3)Polykays can be computed using PolyK[r, s, ...] in the Mathematica application package mathStatica.

### Polyache

The statistics defined such thatwhere is a central moment. These statistics generalize h-statistics and were originally called "generalized -statistics" (Tracy and Gupta 1974). The term "polyache" was introduced by Rose and Smith (2002, p. 255) by way of analogy with the polykay statistic. Polyaches are commonly defined in terms of power sums, for examplePolyaches can be computed using PolyH[r, s, ...] in the Mathematica application package mathStatica.

### Sample variance

The sample variance (commonly written or sometimes ) is the second sample central moment and is defined by(1)where the sample mean and is the sample size.To estimate the population variance from a sample of elements with a priori unknown mean (i.e., the mean is estimated from the sample itself), we need an unbiased estimator for . This estimator is given by k-statistic , which is defined by(2)(Kenney and Keeping 1951, p. 189). Similarly, if samples are taken from a distribution with underlying central moments , then the expected value of the observed sample variance is(3)Note that some authors (e.g., Zwillinger 1995, p. 603) prefer the definition(4)since this makes the sample variance an unbiased estimator for the population variance. The distinction between and is a common source of confusion, and extreme care should be exercised when consulting the literature to determine which convention is in use, especially since the uninformative..

### Sample mean

The sample mean of a set of observations from a given distribution is defined byIt is an unbiased estimator for the population mean . The notation is therefore sometimes used, with the hat indicating that this quantity is an estimator for .The sample mean of a list of data is implemented directly as Mean[list].An interesting empirical relationship between the sample mean, statistical median, and mode which appears to hold for unimodal curves of moderate asymmetry is given by(Kenney and Keeping 1962, p. 53), which is the basis for the definition of thePearson mode skewness.

### Sample central moment

The th sample central moment of a sample with sample size is defined as(1)where is the sample mean. The first few sample central moments are related to power sums by(2)(3)(4)(5)(6)These relations can be given by SampleCentralToPowerSum[r] in the Mathematica application package mathStatica.In terms of the population central moments, the expectation values of the first few sample central moments are(7)(8)(9)(10)

### Variance

For a single variate having a distribution with known population mean , the population variance , commonly also written , is defined as(1)where is the population mean and denotes the expectation value of . For a discrete distribution with possible values of , the population variance is therefore(2)whereas for a continuous distribution,it is given by(3)The variance is therefore equal to the second central moment .Note that some care is needed in interpreting as a variance, since the symbol is also commonly used as a parameter related to but not equivalent to the square root of the variance, for example in the log normal distribution, Maxwell distribution, and Rayleigh distribution.If the underlying distribution is not known, then the samplevariance may be computed as(4)where is the sample mean.Note that the sample variance defined above is not an unbiased estimator for the population variance . In order to obtain an unbiased estimator for..

### Kurtosis

Kurtosis is defined as a normalized form of the fourth central moment of a distribution. There are several flavors of kurtosis, the most commonly encountered variety of which is normally termed simply "the" kurtosis and is denoted (Pearson's notation; Abramowitz and Stegun 1972, p. 928) or (Kenney and Keeping 1951, p. 27; Kenney and Keeping 1961, pp. 99-102). The kurtosis of a theoretical distribution is defined by(1)where denotes the th central moment (and in particular, is the variance). This form is implemented in the Wolfram Language as Kurtosis[dist].The "kurtosis excess" (Kenney and Keeping1951, p. 27) is defined by(2)(3)and is commonly denoted (Abramowitz and Stegun 1972, p. 928) or . Kurtosis excess is commonly used because of a normal distribution is equal to 0, while the kurtosis proper is equal to 3. Unfortunately, Abramowitz and Stegun (1972) confusingly refer to as..

### Gamma statistic

where are cumulants and is the standard deviation.

### Robbin's inequality

If the fourth moment , thenwhere is the variance.

### Relative deviation

Let denote the mean of a set of quantities , then the relative deviation is defined by

### Absolute deviation

Let denote the mean of a set of quantities , then the absolute deviation is defined by

### Sheppard's correction

A correction which must be applied to the measured moments obtained from normally distributed data which have been binned in order to obtain correct estimators for the population moments . The corrected versions of the second, third, and fourth moments are then(1)(2)(3)where is the class interval.If is the th cumulant of an ungrouped distribution and the th cumulant of the grouped distribution with class interval , the corrected cumulants (under rather restrictive conditions) are(4)where is the th Bernoulli number, giving(5)(6)(7)(8)(9)(10)For a proof, see Kendall et al. (1998).

### Cumulant

Let be the characteristic function, defined as the Fourier transform of the probability density function using Fourier transform parameters ,(1)(2)The cumulants are then defined by(3)(Abramowitz and Stegun 1972, p. 928). Taking the Maclaurinseries gives(4)where are raw moments, so(5)(6)(7)(8)(9)These transformations can be given by CumulantToRaw[n] in the Mathematica application package mathStatica.In terms of the central moments ,(10)(11)(12)(13)(14)where is the mean and is the variance. These transformations can be given by CumulantToCentral[n].Multivariate cumulants can be expressed in terms of raw moments, e.g.,(15)(16)and central moments, e.g.,(17)(18)(19)(20)(21)using CumulantToRaw[m, n, ...] and CumulantToCentral[m, n, ...], respectively.The k-statistics are unbiasedestimators of the cumulants...

### Sample variance distribution

Let samples be taken from a population with central moments . The sample variance is then given by(1)where is the sample mean.The expected value of for a sample size is then given by(2)Similarly, the expected variance of the sample varianceis given by(3)(4)(Kenney and Keeping 1951, p. 164; Rose and Smith 2002, p. 264).The algebra of deriving equation (4) by hand is rather tedious,but can be performed as follows. Begin by noting that(5)so(6)The value of is already known from equation (◇), so it remains only to find . The algebra is simplified considerably by immediately transforming variables to and performing computations with respect to these central variables. Since the variance does not depend on the mean of the underlying distribution, the result obtained using the transformed variables will give an identical result while immediately eliminating expectation values of sums of terms containing odd powers of (which..

### Moment problem

The moment problem, also called "Hausdorff's moment problem" or the "little moment problem," may be stated as follows. Given a sequence of numbers , under what conditions is it possible to determine a function of bounded variation in the interval such thatfor , 1, .... Such a sequence is called a moment sequence, and Hausdorff (1921ab) was the first to obtain necessary and sufficient conditions for a sequence to be a moment sequence.

### Covariance

Covariance provides a measure of the strength of the correlation between two or more sets of random variates. The covariance for two random variates and , each with sample size , is defined by the expectation value(1)(2)where and are the respective means, which can be written out explicitly as(3)For uncorrelated variates,(4)so the covariance is zero. However, if the variables are correlated in some way, then their covariance will be nonzero. In fact, if , then tends to increase as increases, and if , then tends to decrease as increases. Note that while statistically independent variables are always uncorrelated, the converse is not necessarily true.In the special case of ,(5)(6)so the covariance reduces to the usual variance . This motivates the use of the symbol , which then provides a consistent way of denoting the variance as , where is the standard deviation.The derived quantity(7)(8)is called statistical correlation of and .The covariance..

### Sample variance computation

When computing the sample variance numerically, the mean must be computed before can be determined. This requires storing the set of sample values. However, it is possible to calculate using a recursion relationship involving only the last sample as follows. This means itself need not be precomputed, and only a running set of values need be stored at each step.In the following, use the somewhat less than optimal notation to denote calculated from the first samples (i.e., not the th moment)(1)and let denotes the value for the bias-corrected sample variance calculated from the first samples. The first few values calculated for the mean are(2)(3)(4)Therefore, for , 3 it is true that(5)Therefore, by induction,(6)(7)(8)(9)By the definition of the sample variance,(10)for . Defining , can then be computed using the recurrence equation(11)(12)(13)(14)Working on the first term,(15)(16)Use (◇) to write(17)so(18)Now work on the second..

### Charlier's check

A check which can be used to verify correct computations in a table of grouped classes. For example, consider the following table with specified class limits and frequencies . The class marks are then computed as well as the rescaled frequencies , which are given by(1)where the class mark is taken as and the class interval is . The remaining quantities are then computed as follows.class limits30-3934.52321840-4944.53271250-5954.511441160-6964.52020070-7974.5320003280-8984.5251252510090-9994.572142863total100176236In order to compute the variance, note that(2)(3)(4)so the variance of the original data is(5)Charlier's check makes use of the additional column added to the right side of the table. By noting that the identity(6)(7)connects columns five through seven, it can be checked that the computations have been done correctly. In the example above,(8)so the computations pass Charlier's check...

### Moment

The th raw moment (i.e., moment about zero) of a distribution is defined by(1)where(2), the mean, is usually simply denoted . If the moment is instead taken about a point ,(3)A statistical distribution is not uniquely specified by its moments, although it is by its characteristic function.The moments are most commonly taken about the mean. These so-called central moments are denoted and are defined by(4)(5)with . The second moment about the mean is equal to the variance(6)where is called the standard deviation.The related characteristic function isdefined by(7)(8)The moments may be simply computed using the moment-generatingfunction,(9)

### Sample raw moment

The th sample raw moment of a sample with sample size is defined as(1)The sample raw moments are unbiased estimators of the population rawmoments,(2)(Rose and Smith 2002, p. 253). The sample raw moment is related to power sums by(3)This relationship can be given by SampleRawToPowerSum[r] in the Mathematica application package mathStatica.

### Central moment

A moment of a univariate probability density function taken about the mean ,(1)(2)where denotes the expectation value. The central moments can be expressed as terms of the raw moments (i.e., those taken about zero) using the binomial transform(3)with (Papoulis 1984, p. 146). The first few central moments expressed in terms of the raw moments are therefore(4)(5)(6)(7)(8)These transformations can be obtained using CentralToRaw[n] in the Mathematica application package mathStatica.The central moments can also be expressed in terms of the cumulants , with the first few cases given by(9)(10)(11)(12)These transformations can be obtained using CentralToCumulant[n] in the Mathematica application package mathStatica.The central moment of a multivariate probability density function can be similarly defined as(13)Therefore,(14)For example,(15)(16)Similarly, the multivariate central moments can be expressed in terms..

### Bessel's correction

Bessel's correction is the factor in the relationship between the variance and the expectation values of the sample variance,(1)where(2)As noted by Kenney and Keeping (1951, p. 161), the correction factor is probably more properly attributed to Gauss, who used it in this connection as early as 1823 (Gauss 1823).For two samples,(3)(Kenney and Keeping 1951, p. 162).

### Raw moment

A moment of a probability function taken about 0,(1)(2)The raw moments (sometimes also called "crude moments") can be expressed as terms of the central moments (i.e., those taken about the mean ) using the inverse binomial transform(3)with and (Papoulis 1984, p. 146). The first few values are therefore(4)(5)(6)(7)The raw moments can also be expressed in terms of the cumulants by exponentiating both sides of the series(8)where is the characteristic function, to obtain(9)The first few terms are then given by(10)(11)(12)(13)(14)These transformations can be obtained using RawToCumulant[n] in the Mathematica application package mathStatica.The raw moment of a multivariate probability function can be similarly defined as(15)Therefore,(16)The multivariate raw moments can be expressed in terms of the multivariate cumulants. For example,(17)(18)These transformations can be obtained using RawToCumulant[m,..

### Variation coefficient

If is the standard deviation of a set of samples and their mean, then the variation coefficient is defined as

### Population mean

The mean of a distribution with probability density function is the first raw moment , defined by(1)where is the expectation value.For a continuous distributionfunction, the population mean is given by(2)where is the expectation value. Similarly, for a discrete distribution,(3)The population mean of a distribution is implemented in the WolframLanguage as Mean[dist].The sample mean is an unbiasedestimator for the population mean.

### Standard error

There appear to be two different definitions of the standard error.The standard error of a sample of sample size is the sample's standard deviation divided by . It therefore estimates the standard deviation of the sample mean based on the population mean (Press et al. 1992, p. 465). Note that while this definition makes no reference to a normal distribution, many uses of this quantity implicitly assume such a distribution.The standard error of an estimate may also be defined as the square root of the estimated error variance of the quantity,(Kenney and Keeping 1951, p. 187; Zwillinger 1995, p. 626).

### Pearson's skewness coefficients

Given a statistical distribution with measured mean, statistical median, mode, and standard deviation , Pearson's first skewness coefficient, also known as the Pearson mode skewness, is defined bywhich was incorrectly implemented (with a spurious multiplicative factor of 3) in versions of the Wolfram Language prior to 6 as PearsonSkewness1[data] after loading the package Statistics$DescriptiveStatistics$.Pearson's second coefficient iswhich was implemented in versions of WolframLanguage prior to 6 as PearsonSkewness2[data].

### Standard deviation distribution

Consider the sample standard deviation(1)for samples taken from a population with a normal distribution. The distribution of is then given by(2)where is a gamma function and(3)(Kenney and Keeping 1951, pp. 161 and 171). The function is plotted above for (red), 4 (orange), ..., 10 (blue), and 12 (violet).The mean is given by(4)(5)where(6)(Kenney and Keeping 1951, p. 171). The function is known as in statistical process control (Duncan 1986, pp. 62 and 134). Romanovsky showed that(7)(OEIS A088801 and A088802;Romanovsky 1925; Pearson 1935; Kenney and Keeping 1951, p. 171).The raw moments are given by(8)and the variance of is(9)(10) is an unbiased estimator of (Kenney and Keeping 1951, p. 171).

### Pearson mode skewness

Given a statistical distribution with measured mean, mode, and standard deviation , the Pearson mode skewness isThe function was incorrectly implemented (with a spurious multiplicative factor of 3) in versions of the Wolfram Language prior to 6 as PearsonSkewness1[data] after loading the package Statistics$DescriptiveStatistics$.This measure was suggested by Karl Pearson, and has the property that for a type III Pearson distribution, it is equal towhere is the third standardized moment (Kenney and Keeping 1962, p. 101; Kenney and Keeping 1951, p. 106).

### Excess

The kurtosis excess of a distribution is sometimescalled the excess, or excess coefficient.In graph theory, excess refers to the quantity(1)for a -regular graph on nodes with girth , where(2)(Biggs and Ito 1980, Wong 1982). A -cage graph having vertices (i.e., the minimal number, so that the excess is ) is called a Moore graph.

### Mean deviation

The mean deviation (also called the mean absolute deviation) is the mean of the absolute deviations of a set of data about the data's mean. For a sample size , the mean deviation is defined by(1)where is the mean of the distribution. The mean deviation of a list of numbers is implemented in the Wolfram Language as MeanDeviation[data].The mean deviation for a discrete distribution defined for , 2, ..., is given by(2)Mean deviation is an important descriptive statistic that is not frequently encountered in mathematical statistics. This is essentially because while mean deviation has a natural intuitive definition as the "mean deviation from the mean," the introduction of the absolute value makes analytical calculations using this statistic much more complicated than the standard deviation(3)As a result, least squares fitting and other standard statistical techniques rely on minimizing the sum of square residuals instead of..

### Standard deviation

The standard deviation of a probability distribution is defined as the square root of the variance ,(1)(2)where is the mean, is the second raw moment, and denotes the expectation value of . The variance is therefore equal to the second central moment (i.e., moment about the mean),(3)The square root of the sample variance of a set of values is the sample standard deviation(4)The sample standard deviation distributionis a slightly complicated, though well-studied and well-understood, function.However, consistent with widespread inconsistent and ambiguous terminology, the square root of the bias-corrected variance is sometimes also known as the standard deviation,(5)The standard deviation of a list of data is implemented as StandardDeviation[list].Physical scientists often use the term root-mean-square as a synonym for standard deviation when they refer to the square root of the mean squared deviation of a quantity from a given..

### Skewness

Skewness is a measure of the degree of asymmetry of a distribution. If the left tail (tail at small end of the distribution) is more pronounced than the right tail (tail at the large end of the distribution), the function is said to have negative skewness. If the reverse is true, it has positive skewness. If the two are equal, it has zero skewness.Several types of skewness are defined, the terminology and notation of which are unfortunately rather confusing. "The" skewness of a distribution is defined to be(1)where is the th central moment. The notation is due to Karl Pearson, but the notations (Kenney and Keeping 1951, p. 27; Kenney and Keeping 1962, p. 99) and (due to R. A. Fisher) are also encountered (Kenney and Keeping 1951, p. 27; Kenney and Keeping 1962, p. 99; Abramowitz and Stegun 1972, p. 928). Abramowitz and Stegun (1972, p. 928) also confusingly refer to both and as "skewness."..