Probability and statistics

Probability and statistics

Subscribe to our updates
79 345 subscribers already with us

Math Topics A - Z listing


Probability and statistics Topics

Sort by:

Lyapunov condition

The Lyapunov condition, sometimes known as Lyapunov's central limit theorem, states that if the th moment (with ) exists for a statistical distribution of independent random variates (which need not necessarily be from same distribution), the means and variances are finite, and(1)then if(2)where(3)the central limit theorem holds.

Lindeberg condition

A sufficient condition on the Lindeberg-Feller central limit theorem. Given random variates , , ..., let , the variance of be finite, and variance of the distribution consisting of a sum of s(1)be(2)In the terminology of Zabell (1995), let(3)where denotes the expectation value of restricted to outcomes , then the Lindeberg condition is(4)for all (Zabell 1995).In the terminology of Feller (1971), the Lindeberg condition assumed that for each ,(5)or equivalently(6)Then the distribution(7)tends to the normal distribution with zero expectation and unit variance (Feller 1971, p. 256). The Lindeberg condition (5) guarantees that the individual variances are small compared to their sum in the sense that for given for all sufficiently large , for , ..., (Feller 1971, p. 256).

Maximum likelihood

Maximum likelihood, also called the maximum likelihood method, is the procedure of finding the value of one or more parameters for a given statistic which makes the known likelihood distribution a maximum. The maximum likelihood estimate for a parameter is denoted .For a Bernoulli distribution,(1)so maximum likelihood occurs for . If is not known ahead of time, the likelihood function is(2)(3)(4)where or 1, and , ..., .(5)(6)Rearranging gives(7)so(8)For a normal distribution,(9)(10)so(11)and(12)giving(13)Similarly,(14)gives(15)Note that in this case, the maximum likelihood standard deviation is the sample standard deviation, which is a biased estimator for the population standard deviation.For a weighted normal distribution,(16)(17)(18)gives(19)The variance of the mean isthen(20)But(21)so(22)(23)(24)For a Poisson distribution,(25)(26)(27)(28)..

Estimator bias

The bias of an estimator is defined as(1)It is therefore true that(2)(3)An estimator for which is said to be unbiased estimator.


An estimator is a rule that tells how to calculate an estimate based on the measurements contained in a sample. For example, the sample mean is an estimator for the population mean .The mean square error of an estimator is defined by(1)Let be the estimator bias, then(2)(3)(4)where is the estimator variance.

Walsh index

The statistical indexwhere is the price per unit in period and is the quantity produced in period .

Paasche's index

The statistical indexwhere is the price per unit in period and is the quantity produced in period .

Mitchell index

The statistical indexwhere is the price per unit in period and is the quantity produced in period .

Laspeyres' index

The statistical indexwhere is the price per unit in period and is the quantity produced in the initial period.

Index number

A statistic which assigns a single number to several individual statistics in order to quantify trends. The best-known index in the United States is the consumer price index, which gives a sort of "average" value for inflation based on price changes for a group of selected products. The Dow Jones and NASDAQ indexes for the New York and American Stock Exchanges, respectively, are also index numbers.Let be the price per unit in period , be the quantity produced in period , and be the value of the units. Let be the estimated relative importance of a product. There are several types of indices defined, among them those listed in the following table. indexabbr.formulaBowley indexFisher indexgeometric mean indexharmonic mean indexLaspeyres' indexmarshall-Edgeworth indexmitchell indexPaasche's indexWalsh index..

Harmonic mean index

The statistical indexwhere is the price per unit in period , is the quantity produced in period , and the value of the units, and subscripts 0 indicate the reference year.

Geometric mean index

The statistical indexwhere is the price per unit in period , is the quantity produced in period , and the value of the units.

Joint distribution function

A joint distribution function is a distribution function in two variables defined by(1)(2)(3)so that the joint probability function satisfies(4)(5)(6)(7)(8)Two random variables and are independent iff(9)for all and and(10)A multiple distribution function is of the form(11)

Distribution function

The distribution function , also called the cumulative distribution function (CDF) or cumulative frequency function, describes the probability that a variate takes on a value less than or equal to a number . The distribution function is sometimes also denoted (Evans et al. 2000, p. 6).The distribution function is therefore related to a continuous probability density function by(1)(2)so (when it exists) is simply the derivative of the distribution function(3)Similarly, the distribution function is related to a discrete probability by(4)(5)There exist distributions that are neither continuous nor discrete.A joint distribution function can bedefined if outcomes are dependent on two parameters:(6)(7)(8)Similarly, a multivariate distribution function can be defined if outcomes depend on parameters:(9)The probability content of a closed region can be found much more efficiently than by direct integration of the probability..

Mean distribution

For an infinite population with mean , variance , skewness , and kurtosis excess , the corresponding quantities for the distribution of means are(1)(2)(3)(4)For a population of (Kenney and Keeping 1962, p. 181),(5)(6)

Discrete distribution

A statistical distribution whose variables can take on only discrete values. Abramowitz and Stegun (1972, p. 929) give a table of the parameters of most common discrete distributions.A discrete distribution with probability function defined over , 2, ..., has distribution functionand population mean

Multinomial distribution

Let a set of random variates , , ..., have a probability function(1)where are nonnegative integers such that(2)and are constants with and(3)Then the joint distribution of , ..., is a multinomial distribution and is given by the corresponding coefficient of the multinomial series(4)In the words, if , , ..., are mutually exclusive events with , ..., . Then the probability that occurs times, ..., occurs times is given by(5)(Papoulis 1984, p. 75).The mean and variance of are(6)(7)The covariance of and is(8)

Normal ratio distribution

The ratio of independent normally distributed variates with zero mean is distributed with a Cauchy distribution. This can be seen as follows. Let and both have mean 0 and standard deviations of and , respectively, then the joint probability density function is the bivariate normal distribution with ,(1)From ratio distribution, the distribution of is(2)(3)(4)But(5)so(6)(7)(8)which is a Cauchy distribution.A more direct derivative proceeds from integration of(9)(10)where is a delta function.

Normal product distribution

The distribution of a product of two normally distributed variates and with zero means and variances and is given by(1)(2)where is a delta function and is a modified Bessel function of the second kind. This distribution is plotted above in red.The analogous expression for a product of three normal variates can be given in termsof Meijer G-functions as(3)plotted above in blue.

Normal distribution function

A normalized form of the cumulative normal distribution function giving the probability that a variate assumes a value in the range ,(1)It is related to the probability integral(2)by(3)Let so . Then(4)Here, erf is a function sometimes called the error function. The probability that a normal variate assumes a value in the range is therefore given by(5)Neither nor erf can be expressed in terms of finite additions, subtractions, multiplications, and root extractions, and so must be either computed numerically or otherwise approximated.Note that a function different from is sometimes defined as "the" normal distribution function(6)(7)(8)(9)(Feller 1968; Beyer 1987, p. 551), although this function is less widely encountered than the usual . The notation is due to Feller (1971).The value of for which falls within the interval with a given probability is a related quantity called the confidence interval.For small values..

Von mises distribution

A continuous distribution defined on the range with probability density function(1)where is a modified Bessel function of the first kind of order 0, and distribution function(2)which cannot be done in closed form. Here, is the mean direction and is a concentration parameter. The von Mises distribution is the circular analog of the normal distribution on a line.The mean is(3)and the circular variance is(4)

Normal difference distribution

Amazingly, the distribution of a difference of two normally distributed variates and with means and variances and , respectively, is given by(1)(2)where is a delta function, which is another normal distribution having mean(3)and variance(4)

Uniform sum distribution

The distribution for the sum of uniform variates on the interval can be found directly as(1)where is a delta function.A more elegant approach uses the characteristicfunction to obtain(2)where the Fourier parameters are taken as . The first few values of are then given by(3)(4)(5)(6)illustrated above.Interestingly, the expected number of picks of a number from a uniform distribution on so that the sum exceeds 1 is e (Derbyshire 2004, pp. 366-367). This can be demonstrated by noting that the probability of the sum of variates being greater than 1 while the sum of variates being less than 1 is(7)(8)(9)The values for , 2, ... are 0, 1/2, 1/3, 1/8, 1/30, 1/144, 1/840, 1/5760, 1/45360, ... (OEIS A001048). The expected number of picks needed to first exceed 1 is then simply(10)It is more complicated to compute the expected number of picks that is needed for their sum to first exceed 2. In this case,(11)(12)The first few terms are therefore 0, 0, 1/6,..

Uniform ratio distribution

The ratio of uniform variates and on the interval can be found directly as(1)(2)where is a delta function and is the Heaviside step function.The distribution is normalized, but its mean and moments diverge.

Uniform product distribution

The distribution of the product of uniform variates on the interval can be found directly as(1)(2)where is a delta function. The distributions are plotted above for (red), (yellow), and so on.

Uniform difference distribution

The difference of two uniform variates on the interval can be found as(1)(2)where is a delta function and is the Heaviside step function.

Error function distribution

A normal distribution with mean0,(1)The characteristic function is(2)The mean, variance, skewness,and kurtosis excess are(3)(4)(5)(6)The cumulants are(7)(8)(9)for .

Erlang distribution

Given a Poisson distribution with a rate of change , the distribution function giving the waiting times until the th Poisson event is(1)(2)for , where is a complete gamma function, and an incomplete gamma function. With explicitly an integer, this distribution is known as the Erlang distribution, and has probability function(3)It is closely related to the gamma distribution, which is obtained by letting (not necessarily an integer) and defining . When , it simplifies to the exponential distribution.Evans et al. (2000, p. 71) write the distribution using the variables and .

Doob's theorem

A theorem proved by Doob (1942) which states that any random process which is both normal and Markov has the following forms for its correlation function , spectral density , and probability densities and :(1)(2)(3)(4)where is the mean, the standard deviation, and the relaxation time.

Difference of successes

If and are the observed proportions from standard normally distributed samples with proportion of success , then the probability that(1)will be as great as observed is(2)where(3)(4)(5)Here, is the unbiased estimator. The skewness and kurtosis excess of this distribution are(6)(7)

Standard normal distribution

A standard normal distribution is a normal distribution with zero mean () and unit variance (), given by the probability density function and distribution function(1)(2)over the domain .It has mean, variance, skewness,and kurtosis excess given by(3)(4)(5)(6)The first quartile of the standard normal distribution occurs when , which is(7)(8)(OEIS A092678; Kenney and Keeping 1962, p. 134), where is the inverse erf function. The absolute value of this is known as the probable error.

Logarithmic distribution

The logarithmic distribution is a continuous distribution for a variate with probability function(1)and distribution function(2)It therefore applies to a variable distributed as , and has appropriate normalization.Note that the log-series distribution is sometimes also known as the logarithmic distribution, and the distribution arising in Benford's law is also "a" logarithmic distribution.The raw moments are given by(3)The mean is therefore(4)The variance, skewness,and kurtosis excess are slightly complicated expressions.

S distribution

The distribution is defined in terms of its distribution function as the solution to the initial value problemwhere (Savageau 1982, Aksenov and Savageau 2001). It has four free parameters: , , , and .The distribution is capable of approximating many central and noncentral unimodal univariate distributions rather well (Voit 1991), but also includes the exponential, logistic, uniform and linear distributions as special cases. The S distribution derives its name from the fact that it is based on the theory of S-systems (Savageau 1976, Voit 1991, Aksenov and Savageau 2001).

Continuity correction

A correction to a discrete binomial distributionto approximate a continuous distribution.whereis a continuous variate with a normal distribution and is a variate of a binomial distribution.

Probable error

The probability that a random sample from an infinite normally distributed universe will have a mean within a distance of the mean of the universe is(1)where is the normal distribution function and is the observed value of(2)The probable error is then defined as the value of such that , i.e.,(3)which is given by(4)(5)(OEIS A092678; Kenney and Keeping 1962, p. 134). Here, is the inverse erf function. The probability of a deviation from the true population value at least as great as the probable error is therefore 1/2.

Price's theorem

Consider a bivariate normal distribution in variables and with covariance(1)and an arbitrary function . Then the expected value of the random variable (2)satisfies(3)

Gibrat's distribution

Gibrat's distribution is a continuous distribution in which the logarithm of a variable has a normal distribution,(1)defined over the interval . It is a special case of the log normal distribution(2)with and , and so has distribution function(3)The mean, variance, skewness,and kurtosis excess are then given by(4)(5)(6)(7)

Pearson type iii distribution

A skewed distribution which is similar to the binomial distribution when (Abramowitz and Stegun 1972, p. 930).(1)for where(2)(3) is the gamma function, and is a standardized variate. Another form is(4)For this distribution, the characteristicfunction is(5)and the mean, variance, skewness, and kurtosis excess are(6)(7)(8)(9)

Pearson system

A system of equation types obtained by generalizing the differential equation forthe normal distribution(1)which has solution(2)to(3)which has solution(4)Let , be the roots of . Then the possible types of curves are 0. , . E.g., normal distribution. I. , . E.g., beta distribution. II. , , where . III. , , where . E.g., gamma distribution. This case is intermediate to cases I and VI. IV. , . V. , where . Intermediate to cases IV and VI. VI. , where is the larger root. E.g., beta prime distribution. VII. , , . E.g., Student's t-distribution. Classes IX-XII are discussed in Pearson (1916). See also Craig (in Kenney and Keeping 1951).If a Pearson curve possesses a mode, it will be at . Let at and , where these may be or . If also vanishes at , , then the th moment and th moments exist.(5)giving(6)(7)Now define the raw th moment by(8)so combining (7) with (8) gives(9)For ,(10)so(11)and for ,(12)so(13)Combining (11), (13), and the definitions(14)(15)obtained..

Gaussian joint variable theorem

The Gaussian joint variable theorem, also called the multivariate theorem, states that given an even number of variates from a normal distribution with means all 0,(1)etc. Given an odd number of variates,(2)(3)etc.

Beta prime distribution

A distribution with probability functionwhere is a beta function. The mode of a variate distributed as isIf is a variate, then is a variate. If is a variate, then and are and variates. If and are and variates, then is a variate. If and are variates, then is a variate.

Normal sum distribution

Amazingly, the distribution of a sum of two normally distributed independent variates and with means and variances and , respectively is another normal distribution(1)which has mean(2)and variance(3)By induction, analogous results hold for the sum of normally distributed variates.An alternate derivation proceeds by noting that(4)(5)where is the characteristic function and is the inverse Fourier transform, taken with parameters .More generally, if is normally distributed with mean and variance , then a linear function of ,(6)is also normally distributed. The new distribution has mean and variance , as can be derived using the moment-generating function(7)(8)(9)(10)(11)which is of the standard form with(12)(13)For a weighted sum of independent variables(14)the expectation is given by(15)(16)(17)(18)(19)Setting this equal to(20)gives(21)(22)Therefore, the mean and variance of the weighted sums of random variables..

Nested hypothesis

Let be the set of all possibilities that satisfy hypothesis , and let be the set of all possibilities that satisfy hypothesis . Then is a nested hypothesis within iff , where denotes the proper subset.

Bonferroni correction

The Bonferroni correction is a multiple-comparison correction used when several dependent or independent statistical tests are being performed simultaneously (since while a given alpha value may be appropriate for each individual comparison, it is not for the set of all comparisons). In order to avoid a lot of spurious positives, the alpha value needs to be lowered to account for the number of comparisons being performed.The simplest and most conservative approach is the Bonferroni correction, which sets the alpha value for the entire set of comparisons equal to by taking the alpha value for each comparison equal to . Explicitly, given tests for hypotheses () under the assumption that all hypotheses are false, and if the individual test critical values are , then the experiment-wide critical value is . In equation form, iffor , thenwhich follows from the Bonferroni inequalities...

Bessel's statistical formula

Let and be the observed mean and variance of a sample of drawn from a normal universe with unknown mean and let and be the observed mean and variance of a sample of drawn from a normal universe with unknown mean . Assume the two universes have a common variance , and define(1)(2)(3)Then(4)is distributed as Student's t-distribution with .


Let . A value such that is considered "significant" (i.e., is not simply due to chance) is known as an alpha value. The probability that a variate would assume a value greater than or equal to the observed value strictly by chance, , is known as a P-value.Depending on the type of data and conventional practices of a given field of study, a variety of different alpha values may be used. One commonly used terminology takes as "not significant," , as "significant" (sometimes denoted *), and as "highly significant" (sometimes denoted **). Some authors use the term "almost significant" to refer to , although this practice is not recommended.


"Analysis of Variance." A statistical test for heterogeneity of means by analysis of group variances. ANOVA is implemented as ANOVA[data] in the Wolfram Language package ANOVA` .To apply the test, assume random sampling of a variate with equal variances, independent errors, and a normal distribution. Let be the number of replicates (sets of identical observations) within each of factor levels (treatment groups), and be the th observation within factor level . Also assume that the ANOVA is "balanced" by restricting to be the same for each factor level.Now define the sum of square terms(1)(2)(3)(4)(5)which are the total, treatment, and error sums of squares. Here, is the mean of observations within factor level , and is the "group" mean (i.e., mean of means). Compute the entries in the following table, obtaining the P-value corresponding to the calculated F-ratio of the mean squared values(6)category freedomSSmean..

Residual vs. predictor plot

A plot of versus the estimator . Random scatter indicates the model is probably good. A pattern indicates a problem with the model. If the spread in increases as increases, the errors are called heteroscedastic.

Likelihood ratio

A quantity used to test nested hypotheses. Let be a nested hypothesis with degrees of freedom within (which has degrees of freedom), then calculate the maximum likelihood of a given outcome, first given , then given . ThenComparison of to the critical value of the chi-squared distribution with degrees of freedom then gives the significance of the increase in likelihood.The term likelihood ratio is also used (especially in medicine) to test nonnested complementary hypotheses as follows,

Population comparison

Let and be the number of successes in variates taken from two populations. Define(1)(2)The estimator of the difference is then . Doing a so-called -transform,(3)where(4)The standard error is(5)(6)(7)

Weighted inversion statistic

A statistic on the symmetric group is called a weighted inversion statistic if there exists an upper triangular matrix such thatwhere is the characteristic function.The inversion count ( for ) defined by Cramer (1750) and the major index (; otherwise) defined by MacMahon (1913) are both weighted inversion statistics (Degenhardt and Milne).


The trimean is defined to bewhere are the hinges and is the statistical median. Press et al. (1992) call this Tukey's trimean. It is an L-estimate.

Smith's markov process theorem

Consider(1)If the probability distribution is governed by a Markovprocess, then(2)(3)Assuming no time dependence, so ,(4)

Markov sequence

A sequence , , ... of random variates is called Markov (or Markoff) if, for any ,i.e., if the conditional distribution of assuming , , ..., equals the conditional distribution of assuming only (Papoulis 1984, pp. 528-529). The transitional densities of a Markov sequence satisfy the Chapman-Kolmogorov equation.

Markov process

A random process whose future probabilities are determined by its most recent values. A stochastic process is called Markov if for every and , we haveThis is equivalent to(Papoulis 1984, p. 535).

Markov chain

A Markov chain is collection of random variables (where the index runs through 0, 1, ...) having the property that, given the present, the future is conditionally independent of the past.In other words,If a Markov sequence of random variates take the discrete values , ..., , thenand the sequence is called a Markov chain (Papoulis 1984, p. 532).A simple random walk is an example of a Markovchain.The Season 1 episode "Man Hunt" (2005) of the television crime drama NUMB3RS features Markov chains.

Chernoff face

A way to display variables on a two-dimensional surface. For instance, let be eyebrow slant, be eye size, be nose length, etc. The above figures show faces produced using 10 characteristics--head eccentricity, eye size, eye spacing, eye eccentricity, pupil size, eyebrow slant, nose size, mouth shape, mouth size, and mouth opening)--each assigned one of 10 possible values, generated using the Wolfram Language.

Wiener sausage

The Wiener sausage of radius is the random process defined bywhere here, is the standard Brownian motion in for and denotes the open ball of radius centered at . Named after Norbert Wiener, the term is also intended to describe visually: Indeed, for a given Brownian motion , is essentially a sausage-like tube of radius having as its central line.

Mean square displacement

The mean square displacement (MSD) of a set of displacements is given byIt arises particularly in Brownian motion and random walk problems. For two-dimensional random walks with unit steps taken in random directions, the MSD is given by

Ito's lemma

Let be a Wiener process. Thenwhere for , and .Note that while Ito's lemma was proved by Kiyoshi Ito (also spelled Itô), Ito's theorem is due to Noboru Itô.

Brownian motion

A real-valued stochastic process is a Brownian motion which starts at if the following properties are satisfied: 1. . 2. For all times , the increments , , ..., , are independent random variables. 3. For all , , the increments are normally distributed with expectation value zero and variance . 4. The function is continuous almost everywhere. The Brownian motion is said to be standard if . It is easily shown from the above criteria that a Brownian motion has a number of unique natural invariance properties including scaling invariance and invariance under time inversion. Moreover, any Brownian motion satisfies a law of large numbers so thatalmost everywhere. Moreover, despite looking ill-behaved at first glance, Brownian motions are Hölder continuous almost everywhere for all values . Contrarily, any Brownian motion is nowhere differentiable almost surely.The above definition is extended naturally to get higher-dimensional Brownian..

Pólya's random walk constants

Let be the probability that a random walk on a -D lattice returns to the origin. In 1921, Pólya proved that(1)but(2)for . Watson (1939), McCrea and Whipple (1940), Domb (1954), and Glasser and Zucker (1977) showed that(3)(OEIS A086230), where(4)(5)(6)(7)(8)(9)(OEIS A086231; Borwein and Bailey 2003, Ch. 2, Ex. 20) is the third of Watson's triple integrals modulo a multiplicative constant, is a complete elliptic integral of the first kind, is a Jacobi theta function, and is the gamma function.Closed forms for are not known, but Montroll (1956) showed that for ,(10)where(11)(12)and is a modified Bessel function of the first kind.Numerical values of from Montroll (1956) and Flajolet (Finch 2003) are given in the following table.OEIS3A0862300.3405374A0862320.1932065A0862330.1351786A0862340.1047157A0862350.08584498A0862360.0729126..


The symmetric statistic defined such that(1)where is a cumulant. These statistics generalize k-statistic and were originally called "generalized -statistics" (Dressel 1940). The term "polykay" was introduced by Tukey (1956; Rose and Smith 2002, p. 255). Polykays are commonly defined in terms of power sums, for example(2)(3)Polykays can be computed using PolyK[r, s, ...] in the Mathematica application package mathStatica.

Wald's equation

Let , ..., be a sequence of independent observations of a random variable , and let the number of observations itself be chosen at random. Then Wald's equation states that the expectation value of the sum is equal to the expectation value of times the expectation value of ,(Wald 1945, Blackwell 1946, Wolfowitz 1947).


The statistics defined such thatwhere is a central moment. These statistics generalize h-statistics and were originally called "generalized -statistics" (Tracy and Gupta 1974). The term "polyache" was introduced by Rose and Smith (2002, p. 255) by way of analogy with the polykay statistic. Polyaches are commonly defined in terms of power sums, for examplePolyaches can be computed using PolyH[r, s, ...] in the Mathematica application package mathStatica.

Unbiased estimator

A quantity which does not exhibit estimator bias. An estimator is an unbiased estimator of if

Fisher's estimator inequality

Given an unbiased estimator of so that . Thenwhere is the variance.

Sample variance

The sample variance (commonly written or sometimes ) is the second sample central moment and is defined by(1)where the sample mean and is the sample size.To estimate the population variance from a sample of elements with a priori unknown mean (i.e., the mean is estimated from the sample itself), we need an unbiased estimator for . This estimator is given by k-statistic , which is defined by(2)(Kenney and Keeping 1951, p. 189). Similarly, if samples are taken from a distribution with underlying central moments , then the expected value of the observed sample variance is(3)Note that some authors (e.g., Zwillinger 1995, p. 603) prefer the definition(4)since this makes the sample variance an unbiased estimator for the population variance. The distinction between and is a common source of confusion, and extreme care should be exercised when consulting the literature to determine which convention is in use, especially since the uninformative..

Sample mean

The sample mean of a set of observations from a given distribution is defined byIt is an unbiased estimator for the population mean . The notation is therefore sometimes used, with the hat indicating that this quantity is an estimator for .The sample mean of a list of data is implemented directly as Mean[list].An interesting empirical relationship between the sample mean, statistical median, and mode which appears to hold for unimodal curves of moderate asymmetry is given by(Kenney and Keeping 1962, p. 53), which is the basis for the definition of thePearson mode skewness.

Expectation value

The expectation value of a function in a variable is denoted or . For a single discrete variable, it is defined by(1)where is the probability density function.For a single continuous variable it is defined by,(2)The expectation value satisfies(3)(4)(5)For multiple discrete variables(6)For multiple continuous variables(7)The (multiple) expectation value satisfies(8)(9)(10)where is the mean for the variable .

Sample central moment

The th sample central moment of a sample with sample size is defined as(1)where is the sample mean. The first few sample central moments are related to power sums by(2)(3)(4)(5)(6)These relations can be given by SampleCentralToPowerSum[r] in the Mathematica application package mathStatica.In terms of the population central moments, the expectation values of the first few sample central moments are(7)(8)(9)(10)


For a single variate having a distribution with known population mean , the population variance , commonly also written , is defined as(1)where is the population mean and denotes the expectation value of . For a discrete distribution with possible values of , the population variance is therefore(2)whereas for a continuous distribution,it is given by(3)The variance is therefore equal to the second central moment .Note that some care is needed in interpreting as a variance, since the symbol is also commonly used as a parameter related to but not equivalent to the square root of the variance, for example in the log normal distribution, Maxwell distribution, and Rayleigh distribution.If the underlying distribution is not known, then the samplevariance may be computed as(4)where is the sample mean.Note that the sample variance defined above is not an unbiased estimator for the population variance . In order to obtain an unbiased estimator for..

Random variable

A random variable is a measurable function from a probability space into a measurable space known as the state space (Doob 1996). Papoulis (1984, p. 88) gives the slightly different definition of a random variable as a real function whose domain is the probability space and such that: 1. The set is an event for any real number . 2. The probability of the events and equals zero. The abbreviation "r.v." is sometimes used to denote a random variable.

Local discrepancy

Given a point set in the -dimensional unit cube , the local discrepancy is defined as is the content of .

Random number

A random number is a number chosen as if by chance from some specified distribution such that selection of a large set of these numbers reproduces the underlying distribution. Almost always, such numbers are also required to be independent, so that there are no correlations between successive numbers. Computer-generated random numbers are sometimes called pseudorandom numbers, while the term "random" is reserved for the output of unpredictable physical processes. When used without qualification, the word "random" usually means "random with a uniform distribution." Other distributions are of course possible. For example, the Box-Muller transformation allows pairs of uniform random numbers to be transformed to corresponding random numbers having a two-dimensional normal distribution.It is impossible to produce an arbitrarily long string of random digits and prove it is random. Strangely, it..

Linear congruence method

A method for generating random (pseudorandom) numbers using the linear recurrence relationwhere and must assume certain fixed values, is some chosen modulus, and is an initial number known as the seed.


A variate is a generalization of the concept of a random variable that is defined without reference to a particular type of probabilistic experiment. It is defined as the set of all random variables that obey a given probabilistic law.It is common practice to denote a variate with a capital letter (most commonly ). The set of all values that can take is then called the range, denoted (Evans et al. 2000, p. 5). Specific elements in the range of are called quantiles and denoted , and the probability that a variate assumes the value is denoted .

Van der corput sequence

Van der Corput sequences are a means of generating sequences of points that are maximally self-avoiding (a.k.a. quasirandom sequences). In the one-dimensional case, the simplest approach to generate such a sequence is to simply divide the interval into a number of equal subintervals. Similarly, one can divide an -dimensional volume by uniformly partitioning each of its dimensions. However, these approaches, have a number of drawbacks for numerical integration, especially for high dimensions.Like quasirandom sequences, "permuted" van der Corput sequences are constrained by a low-discrepancy requirement, which has the net effect of generating points in a highly correlated manner (i.e., the next point "knows" where the previous points are).For example, the ordinary van der Corput sequence in base 3 is given by 1/3, 2/3, 1/9, 4/9, 7/9, 2/9, 5/9, 8/9, 1/27, ......

Quasirandom sequence

A sequence of -tuples that fills n-space more uniformly than uncorrelated random points, sometimes also called a low-discrepancy sequence. Although the ordinary uniform random numbers and quasirandom sequences both produce uniformly distributed sequences, there is a big difference between the two. A uniform random generator on will produce outputs so that each trial has the same probability of generating a point on equal subintervals, for example and . Therefore, it is possible for trials to coincidentally all lie in the first half of the interval, while the st point still falls within the other of the two halves with probability 1/2. This is not the case with the quasirandom sequences, in which the outputs are constrained by a low-discrepancy requirement that has a net effect of points being generated in a highly correlated manner (i.e., the next point "knows" where the previous points are).Such a sequence is extremely useful..

Stochastic process

Doob (1996) defines a stochastic process as a family of random variables from some probability space into a state space . Here, is the index set of the process.Papoulis (1984, p. 312) describes a stochastic process as a family of functions.

Discrete discrepancy

Given a point set in the -dimensional unit cube , the star discrepancy is defined as(1)where the local discrepancy is defined as(2) is the content of , and is the class of all discrete subintervals of of the form(3)with .

Noise sphere

A mapping of random number triples to points in spherical coordinates according to(1)(2)(3)in order to detect unexpected structure indicating correlations between triples. When such structure is present (note that this does not include the expected bunching of points along the -axis according to the factor in the spherical volume element), numbers may not be truly random.

Star discrepancy

Given a point set in the -dimensional unit cube , the star discrepancy is defined as(1)where the local discrepancy is defined as(2) is the content of , and is the class of all -dimensional subintervals of of the form(3)with for . Here, the term "star" refers to the fact that the -dimensional subintervals have a vertex at the origin.

Cliff random number generator

A random number generator produced by iteratingfor a seed . This simple generator passes the noise sphere test for randomness by showing no structure.

Frequency polygon

A distribution of values of a discrete variate represented graphically by plotting points , , ..., , and drawing a set of straight line segments connecting adjacent points. It is usually preferable to use a histogram for grouped distributions.

Laplace distribution

The Laplace distribution, also called the double exponential distribution, is the distribution of differences between two independent variates with identical exponential distributions (Abramowitz and Stegun 1972, p. 930). It had probability density function and cumulative distribution functions given by(1)(2)It is implemented in the Wolfram Language as LaplaceDistribution[mu, beta].The moments about the mean are related to the moments about 0 by(3)where is a binomial coefficient, so(4)(5)where is the floor function and is the gamma function.The moments can also be computed using the characteristicfunction,(6)Using the Fourier transform ofthe exponential function(7)gives(8)(Abramowitz and Stegun 1972, p. 930). The momentsare therefore(9)The mean, variance, skewness,and kurtosis excess are(10)(11)(12)(13)..


Kurtosis is defined as a normalized form of the fourth central moment of a distribution. There are several flavors of kurtosis, the most commonly encountered variety of which is normally termed simply "the" kurtosis and is denoted (Pearson's notation; Abramowitz and Stegun 1972, p. 928) or (Kenney and Keeping 1951, p. 27; Kenney and Keeping 1961, pp. 99-102). The kurtosis of a theoretical distribution is defined by(1)where denotes the th central moment (and in particular, is the variance). This form is implemented in the Wolfram Language as Kurtosis[dist].The "kurtosis excess" (Kenney and Keeping1951, p. 27) is defined by(2)(3)and is commonly denoted (Abramowitz and Stegun 1972, p. 928) or . Kurtosis excess is commonly used because of a normal distribution is equal to 0, while the kurtosis proper is equal to 3. Unfortunately, Abramowitz and Stegun (1972) confusingly refer to as..

Discrete uniform distribution

The discrete uniform distribution is also known as the "equally likely outcomes" distribution. Letting a set have elements, each of them having the same probability, then(1)(2)(3)(4)so using gives(5)Restricting the set to the set of positive integers 1, 2, ..., , the probability distribution function and cumulative distributions function for this discrete uniform distribution are therefore(6)(7)for , ..., .The discrete uniform distribution is implemented in the WolframLanguage as DiscreteUniformDistribution[n].Its moment-generating function is(8)(9)(10)(11)The moments about 0 are(12)so(13)(14)(15)(16)and the moments about the meanare(17)(18)(19)The mean, variance, skewness,and kurtosis excess are(20)(21)(22)(23)The mean deviation for a uniform distribution on elements is given by(24)To do the sum, consider separately the cases of odd and even. For odd,(25)(26)(27)(28)Similarly, for even,(29)(30)(31)(32)The..

Hypergeometric distribution

Let there be ways for a "good" selection and ways for a "bad" selection out of a total of possibilities. Take samples and let equal 1 if selection is successful and 0 if it is not. Let be the total number of successful selections,(1)The probability of successful selections is then(2)(3)(4)The hypergeometric distribution is implemented in the Wolfram Language as HypergeometricDistribution[N, n, m+n].The problem of finding the probability of such a picking problem is sometimes called the "urn problem," since it asks for the probability that out of balls drawn are "good" from an urn that contains "good" balls and "bad" balls. It therefore also describes the probability of obtaining exactly correct balls in a pick- lottery from a reservoir of balls (of which are "good" and are "bad"). For example, for and , the probabilities of obtaining correct balls..


The grouping of data into bins (spaced apart by the so-called class interval) plotting the number of members in each bin versus the bin number. The above histogram shows the number of variates in bins with class interval 1 for a sample of 100 real variates with a uniform distribution from 0 and 10. Therefore, bin 1 gives the number of variates in the range 0-1, bin 2 gives the number of variates in the range 1-2, etc. Histograms are implemented in the Wolfram Language as Histogram[data].

Binomial distribution

The binomial distribution gives the discrete probability distribution of obtaining exactly successes out of Bernoulli trials (where the result of each Bernoulli trial is true with probability and false with probability ). The binomial distribution is therefore given by(1)(2)where is a binomial coefficient. The above plot shows the distribution of successes out of trials with .The binomial distribution is implemented in the Wolfram Language as BinomialDistribution[n, p].The probability of obtaining more successes than the observed in a binomial distribution is(3)where(4) is the beta function, and is the incomplete beta function.The characteristic function for the binomialdistribution is(5)(Papoulis 1984, p. 154). The moment-generating function for the distribution is(6)(7)(8)(9)(10)(11)The mean is(12)(13)(14)The moments about 0 are(15)(16)(17)(18)so the moments about the meanare(19)(20)(21)The skewness..

Poisson distribution

Given a Poisson process, the probability of obtaining exactly successes in trials is given by the limit of a binomial distribution(1)Viewing the distribution as a function of the expected number of successes(2)instead of the sample size for fixed , equation (2) then becomes(3)Letting the sample size become large, the distribution then approaches(4)(5)(6)(7)(8)which is known as the Poisson distribution (Papoulis 1984, pp. 101 and 554; Pfeiffer and Schum 1973, p. 200). Note that the sample size has completely dropped out of the probability function, which has the same functional form for all values of .The Poisson distribution is implemented in the WolframLanguage as PoissonDistribution[mu].As expected, the Poisson distribution is normalized so that the sum of probabilities equals 1, since(9)The ratio of probabilities is given by(10)The Poisson distribution reaches a maximum when(11)where is the Euler-Mascheroni..

Beta distribution

A general type of statistical distribution which is related to the gamma distribution. Beta distributions have two free parameters, which are labeled according to one of two notational conventions. The usual definition calls these and , and the other uses and (Beyer 1987, p. 534). The beta distribution is used as a prior distribution for binomial proportions in Bayesian analysis (Evans et al. 2000, p. 34). The above plots are for various values of with and ranging from 0.25 to 3.00.The domain is , and the probability function and distribution function are given by(1)(2)(3)where is the beta function, is the regularized beta function, and . The beta distribution is implemented in the Wolfram Language as BetaDistribution[alpha, beta].The distribution is normalized since(4)The characteristic function is(5)(6)where is a confluent hypergeometric function of the first kind.The raw moments are given by(7)(8)(Papoulis 1984,..

Beta binomial distribution

A variable with a beta binomial distribution is distributed as a binomial distribution with parameter , where is distribution with a beta distribution with parameters and . For trials, it has probability density function(1)where is a beta function and is a binomial coefficient, and distribution function(2)where is a gamma function and(3)is a generalized hypergeometricfunction.It is implemented as BetaBinomialDistribution[alpha,beta, n].The first few raw moments are(4)(5)(6)giving the mean and varianceas(7)(8)

Probability axioms

Given an event in a sample space which is either finite with elements or countably infinite with elements, then we can writeand a quantity , called the probability of event , is defined such that1. . 2. . 3. Additivity: , where and are mutually exclusive. 4. Countable additivity: for , 2, ..., where , , ... are mutually exclusive (i.e., ).

Hawkes process

There are a number of point processes which are called Hawkes processes and while many of these notions are similar, some are rather different. There are also different formulations for univariate and multivariate point processes.In some literature, a univariate Hawkes process is defined to be a self-exciting temporal point process whose conditional intensity function is defined to be(1)where is the background rate of the process , where are the points in time occurring prior to time , and where is a function which governs the clustering density of . The function is sometimes called the exciting function or the excitation function of . Similarly, some authors (Merhdad and Zhu 2014) denote the conditional intensity function by and rewrite the summand in () as(2)The processes upon which Hawkes himself made the most progress were univariate self-exciting temporal point processes whose conditional intensity function is linear (Hawkes 1971)...

Le cam's inequality

Let be the sum of random variates with a Bernoulli distribution with . Thenwhere

Fisher's theorem

Let be a sum of squares of independent normal standardized variates , and suppose where is a quadratic form in the , distributed as chi-squared with degrees of freedom. Then is distributed as with degrees of freedom and is independent of . The converse of this theorem is known as Cochran's theorem.

Cramér's theorem

If and are independent variates and is a normal distribution, then both and must have normal distributions. This was proved by Cramér in 1936.

Central limit theorem

Let be a set of independent random variates and each have an arbitrary probability distribution with mean and a finite variance . Then the normal form variate(1)has a limiting cumulative distribution function which approaches a normaldistribution.Under additional conditions on the distribution of the addend, the probability density itself is also normal (Feller 1971) with mean and variance . If conversion to normal form is not performed, then the variate(2)is normally distributed with and .Kallenberg (1997) gives a six-line proof of the central limit theorem. For an elementary, but slightly more cumbersome proof of the central limit theorem, consider the inverse Fourier transform of .(3)(4)(5)(6)Now write(7)so we have(8)(9)(10)(11)(12)(13)(14)(15)(16)Now expand(17)so(18)(19)(20)since(21)(22)Taking the Fourier transform,(23)(24)This is of the form(25)where and . But this is a Fourier transform of a Gaussian function,..

Survival function

The survival function describes the probability that a variate takes on a value greater than a number (Evans et al. 2000, p. 6). The survival function is therefore related to a continuous probability density function by(1)so . Similarly, the survival function is related to a discrete probability by(2)The survival function and distribution function are related by(3)since probability functions are normalized.

Statistical distribution

The distribution of a variable is a description of the relative numbers of times each possible outcome will occur in a number of trials. The function describing the probability that a given value will occur is called the probability density function (abbreviated PDF), and the function describing the cumulative probability that a given value or any value smaller than it will occur is called the distribution function (or cumulative distribution function, abbreviated CDF).Formally, a distribution can be defined as a normalized measure, and the distribution of a random variable is the measure on defined by settingwhere is a probability space, is a measurable space, and a measure on with . If the measure is a Radon measure (which is usually the case), then the statistical distribution is a generalized function in the sense of a generalized function...

Stable distribution

Stable distributions are a class of probability distributions allowing skewness and heavy tails (Rimmer and Nolan 2005). They are described by an index of stability (also known as a characteristic exponent) , and skewness parameter , a scale parameter , and a location parameter . Two possible parametrizations include(1)(2)(Rimmer and Nolan 2005). is most convenient for numerical computations, whereas is commonly used in economics.

Sklar's theorem

Let be a two-dimensional distribution function with marginal distribution functions and . Then there exists a copula such thatConversely, for any univariate distribution functions and and any copula , the function is a two-dimensional distribution function with marginals and . Furthermore, if and are continuous, then is unique.

Rényi's parking constants

Given the closed interval with , let one-dimensional "cars" of unit length be parked randomly on the interval. The mean number of cars which can fit (without overlapping!) satisfies(1)The mean density of the cars for large is(2)(3)(4)(OEIS A050996). While the inner integral canbe done analytically,(5)(6)where is the Euler-Mascheroni constant and is the incomplete gamma function, it is not known how to do the outer one(7)(8)(9)where is the exponential integral. The slowly converging series expansion for the integrand is given by(10)(OEIS A050994 and A050995).In addition,(11)for all (Rényi 1958), which was strengthened by Dvoretzky and Robbins (1964) to(12)Dvoretzky and Robbins (1964) also proved that(13)Let be the variance of the number of cars, then Dvoretzky and Robbins (1964) and Mannion (1964) showed that(14)(15)(16)(OEIS A086245), where(17)(18)and the numerical value is due to Blaisdell and Solomon..

Ratio distribution

Given two distributions and with joint probability density function , let be the ratio distribution. Then the distribution function of is(1)(2)(3)The probability function is then(4)(5)(6)For variates with standard normal distributions,the ratio distribution is a Cauchy distribution.For a uniform ratio distribution(7)(8)


A variable is memoryless with respect to if, for all with ,(1)Equivalently,(2)(3)The exponential distribution satisfies(4)(5)and therefore(6)(7)(8)is the only memoryless random distribution.If and are integers, then the geometric distribution is memoryless. However, since there are two types of geometric distribution (one starting at 0 and the other at 1), two types of definition for memoryless are needed in the integer case. If the definition is as above,(9)then the geometric distribution that startsat 1 is memoryless. If the definition becomes(10)then the geometric distribution that startsat 0 is memoryless. Note that these two cases are equivalent in the continuous case.A useful consequence of the memoryless property is(11)where indicates an expectation value.

Weak law of large numbers

The weak law of large numbers (cf. the strong law of large numbers) is a result in probability theory also known as Bernoulli's theorem. Let , ..., be a sequence of independent and identically distributed random variables, each having a mean and standard deviation . Define a new variable(1)Then, as , the sample mean equals the population mean of each variable.(2)(3)(4)(5)In addition,(6)(7)(8)(9)Therefore, by the Chebyshev inequality, for all ,(10)As , it then follows that(11)(Khinchin 1929). Stated another way, the probability that the average for an arbitrary positive quantity approaches 1 as (Feller 1968, pp. 228-229).

Planck's radiation function

Planck's's radiation function is the function(1)which is normalized so that(2)However, the function is sometimes also defined without the numerical normalization factor of (e.g., Abramowitz and Stegun 1972, p. 999).The first and second raw moments are(3)(4)where is Apéry's constant, but higher order raw moments do not exist since the corresponding integrals do not converge.It has a maximum at (OEIS A133838), where(5)and inflection points at (OEIS A133839) and (OEIS A133840), where(6)

Rice distribution

where is a modified Bessel function of the first kind and . For a derivation, see Papoulis (1962). For = 0, this reduces to the Rayleigh distribution.

Zipf distribution

The Zipf distribution, sometimes referred to as the zeta distribution, is a discrete distribution commonly used in linguistics, insurance, and the modelling of rare events. It has probability density function(1)where is a positive parameter and is the Riemann zeta function, and distribution function(2)where is a generalized harmonic number.The Zipf distribution is implemented in the WolframLanguage as ZipfDistribution[rho].The th raw moment is(3)giving the mean and varianceas(4)(5)The distribution has mean deviation(6)where is a Hurwitz zeta function and is the mean as given above in equation (4).

Negative binomial distribution

The negative binomial distribution, also known as the Pascal distribution or Pólya distribution, gives the probability of successes and failures in trials, and success on the th trial. The probability density function is therefore given by(1)(2)(3)where is a binomial coefficient. The distribution function is then given by(4)(5)(6)where is the gamma function, is a regularized hypergeometric function, and is a regularized beta function.The negative binomial distribution is implemented in the Wolfram Language as NegativeBinomialDistribution[r, p].Defining(7)(8)the characteristic function is given by(9)and the moment-generating functionby(10)Since ,(11)(12)(13)(14)The raw moments are therefore(15)(16)(17)(18)where(19)and is the Pochhammer symbol. (Note that Beyer 1987, p. 487, apparently gives the mean incorrectly.)This gives the central moments as(20)(21)(22)The mean, variance, skewnessand..

Wishart distribution

If for , ..., has a multivariate normal distribution with mean vector and covariance matrix , and denotes the matrix composed of the row vectors , then the matrix has a Wishart distribution with scale matrix and degrees of freedom parameter . The Wishart distribution is most typically used when describing the covariance matrix of multinormal samples. The Wishart distribution is implemented as WishartDistribution[sigma, m] in the Wolfram Language package MultivariateStatistics` .

Weibull distribution

The Weibull distribution is given by(1)(2)for , and is implemented in the Wolfram Language as WeibullDistribution[alpha, beta]. The raw moments of the distribution are(3)(4)(5)(6)and the mean, variance, skewness, and kurtosis excess of are(7)(8)(9)(10)where is the gamma function and(11)A slightly different form of the distribution is defined by(12)(13)(Mendenhall and Sincich 1995). This has raw moments(14)(15)(16)(17)so the mean and variance forthis form are(18)(19)The Weibull distribution gives the distribution of lifetimes of objects. It was originally proposed to quantify fatigue data, but it is also used in analysis of systems involving a "weakest link."


Variables and are said to be uncorrelated if their covariance is zero:Independent statistics are always uncorrelated,but the converse is not necessarily true.

Gamma statistic

where are cumulants and is the standard deviation.

Robbin's inequality

If the fourth moment , thenwhere is the variance.

Relative deviation

Let denote the mean of a set of quantities , then the relative deviation is defined by

Absolute deviation

Let denote the mean of a set of quantities , then the absolute deviation is defined by

Fisher's exact test

Fisher's exact test is a statistical test used to determine if there are nonrandom associations between two categorical variables.Let there exist two such variables and , with and observed states, respectively. Now form an matrix in which the entries represent the number of observations in which and . Calculate the row and column sums and , respectively, and the total sum(1)of the matrix. Then calculate the conditional probability of getting the actual matrix given the particular row and column sums, given by(2)which is a multivariate generalization of the hypergeometric probability function. Now find all possible matrices of nonnegative integers consistent with the row and column sums and . For each one, calculate the associated conditional probability using (2), where the sum of these probabilities must be 1.To compute the P-value of the test, the tables must then be ordered by some criterion that measures dependence, and those tables..

Statistical test

A test used to determine the statistical significanceof an observation. Two main types of error can occur: 1. A type I error occurs when a false negative result is obtained in terms of the null hypothesis by obtaining a false positive measurement. 2. A type II error occurs when a false positive result is obtained in terms of the null hypothesis by obtaining a false negative measurement. The probability that a statistical test will be positive for a true statistic is sometimes called the test's sensitivity, and the probability that a test will be negative for a negative statistic is sometimes called the specificity. The following table summarizes the names given to the various combinations of the actual state of affairs and observed test results.resultnametrue positive resultsensitivityfalse negative result1-sensitivitytrue negative resultspecificityfalse positive result1-specificityMultiple-comparison corrections to statistical..


An estimate is an educated guess for an unknown quantity or outcome based on known information. The making of estimates is an important part of statistics, since care is needed to provide as accurate an estimate as possible using as little input data as possible. Often, an estimate for the uncertainty of an estimate can also be determined statistically. A rule that tells how to calculate an estimate based on the measurements contained in a sample is called an estimator.

Total probability theorem

Given mutually exclusive events , ..., whose probabilities sum to unity, thenwhere is an arbitrary event, and is the conditional probability of assuming .

Temporal point process

A temporal point process is a random process whose realizations consist of the times of isolated events.Note that in some literature, the values are assumed to be arbitrary real numbers while the index set is assumed to be the set of integers (Schoenberg 2002); on the other hand, some authors view temporal point processes as binary events so that takes values in a two-element set for each , and further assume that the index set is some finite set of points (Liam 2013). The prior perspective corresponds to viewing temporal point processes as how long events occur where the events themselves are spaced according to a discrete set of time parameters; the latter view corresponds to viewing temporal point processes as indications of whether or not a finite number of events has occurred.The behavior of a simple temporal point process is typically modeled by specifying its conditional intensity . Indeed, a number of specific examples of temporal point..

Tail probability

Define as the set of all points with probabilities such that or , where is a point probability (often, the likelihood of an observed event). Then the associated tail probability is given by .

Point process

A point process is a probabilistic model for random scatterings of points on some space often assumed to be a subset of for some . Oftentimes, point processes describe the occurrence over time of random events in which the occurrences are revealed one-by-one as time evolves; in this case, any collectionof occurrences is said to be a realization of the point process.Poisson processes are regarded as archetypal examplesof point processes (Daley and Vere-Jones 2002).Point processes are sometimes known as counting processes or random scatters.


The mathematical study of the likelihood and probability of events occurring based on known information and inferred by taking a limited number of samples. Statistics plays an extremely important role in many aspects of economics and science, allowing educated guesses to be made with a minimum of expensive or difficult-to-obtain data.A joke told about statistics (or, more precisely, about statisticians), runs as follows. Two statisticians are out hunting when one of them sees a duck. The first takes aim and shoots, but the bullet goes sailing past six inches too high. The second statistician also takes aim and shoots, but this time the bullet goes sailing past six inches too low. The two statisticians then give one another high fives and exclaim, "Got him!" (This joke plays on the fact that the mean of and 6 is 0, so "on average," the two shots hit the duck.)Approximately 73.8474% of extant statistical jokes are maintained..

Stationary point process

There are at least two distinct notions of when a pointprocess is stationary.The most commonly utilized terminology is as follows: Intuitively, a point process defined on a subset of is said to be stationary if the number of points lying in depends on the size of but not its location. On the real line, this is expressed in terms of intervals: A point process on is stationary if for all and for ,depends on the length of but not on the location .Stationary point processes of this kind were originally called simple stationary, though several authors call it crudely stationary instead. In light of the notion of crude stationarity, a different definition of stationary may be stated in which a point process is stationary whenever for every and for all bounded Borel subsets of , the joint distribution of does not depend on . This distinction also gives rise to a related notion known as interval stationarity.Some authors use the alternative definition of an intensity..

Mutually exclusive events

events are said to be mutually exclusive if the occurrence of any one of them precludes any of the others. Therefore, for events , ..., , the conditional probability is for all .

Multidimensional point process

A multidimensional point process is a measurable function from a probability space into where is the set of all finite or countable subsets of not containing an accumulation point and where is the sigma-algebra generated over by the setsfor all bounded Borel subsets . Here, denotes the cardinality or order of the set .A multidimensional point process is sometimes abbreviated MPP, though care should be exhibited not to confuse the notion with that of a marked point process.Despite a number of apparent differences, one can show that multidimensional point processes are a special case of a random closed set on (Baudin 1984).

De méré's problem

The probability of getting at least one "6" in four rolls of a single 6-sideddie is(1)which is slightly higher than the probability of at least one double-six in 24 throws of two dice,(2)The French nobleman and gambler Chevalier de Méré suspected that (1) was higher than (2), but his mathematical skills were not great enough to demonstrate why this should be so. He posed the question to Pascal, who solved the problem and proved de Méré correct. In fact, de Méré's observation remains true even if two dice are thrown 25 times, since the probability of throwing at least one double-six is then(3)

Mills ratio

The Mills ratio is defined as(1)(2)(3)where is the hazard function, is the survival function, is the probability density function, and is the distribution function.For example, for the normal distribution,(4)which simplifies to(5)for the standard normal distribution. The latter function has the particularly simple continued fraction representation(6)(Cuyt et al. 2010, p. 376).

Simple point process

A simple point process (or SPP) is an almost surely increasing sequence of strictly positive, possibly infinite random variables which are strictly increasing as long as they are finite and whose almost sure limit is . Symbolically, then, an SPP is a sequence of -valued random variables defined on a probability space such that 1. , 2. , 3. . Here, and for each , can be interpreted as either the time point at which the th recording of an event takes place or as an indication that fewer than events occurred altogether if or if , respectively (Jacobsen 2006).

Marked point process

A marked point process with mark space is a double sequenceof -valued random variables and -valued random variables defined on a probability space such that is a simple point process (SPP) and: 1. for ; 2. for . Here, denotes probability, denotes the so-called irrelevant mark which is used to describe the mark of an event that never occurs, and .This definition is similar to the definition of an SPP in that it describes a sequence of time points marking the occurrence of events. The difference is that these events may be of different types where the type (i.e., the mark) of the th event is denoted by . Note that, because of the inclusion of the irrelevant mark , marking will assign values for all --even when , i.e., when the th event never occurs (Jacobsen 2006).

Mark space

Given a marked point process of the formthe space is said to be the mark space of .

Conditional probability

The conditional probability of an event assuming that has occurred, denoted , equals(1)which can be proven directly using a Venn diagram.Multiplying through, this becomes(2)which can be generalized to(3)Rearranging (1) gives(4)Solving (4) for and plugging in to (1) gives(5)

Saint petersburg paradox

Consider a game, first proposed by Nicolaus Bernoulli, in which a player bets on how many tosses of a coin will be needed before it first turns up heads. The player pays a fixed amount initially, and then receives dollars if the coin comes up heads on the th toss. The expectation value of the gain is then(1)dollars, so any finite amount of money can be wagered and the player will still come out ahead on average.Feller (1968) discusses a modified version of the game in which the player receives nothing if a trial takes more than a fixed number of tosses. The classical theory of this modified game concluded that is a fair entrance fee, but Feller notes that "the modern student will hardly understand the mysterious discussions of this 'paradox.' "In another modified version of the game, the player bets $2 that heads will turn up on the first throw, $4 that heads will turn up on the second throw (if it did not turn up on the first), $8 that heads will turn..

Coin tossing

An idealized coin consists of a circular disk of zero thickness which, when thrown in the air and allowed to fall, will rest with either side face up ("heads" H or "tails" T) with equal probability. A coin is therefore a two-sided die. Despite slight differences between the sides and nonzero thickness of actual coins, the distribution of their tosses makes a good approximation to a Bernoulli distribution.There are, however, some rather counterintuitive properties of coin tossing. For example, it is twice as likely that the triple TTH will be encountered before THT than after it, and three times as likely that THH will precede HHT. Furthermore, it is six times as likely that HTT will be the first of HTT, TTH, and TTT to occur than either of the others (Honsberger 1979). There are also strings of Hs and Ts that have the property that the expected wait to see string is less than the expected wait to see , but the probability of seeing before..

Russian roulette

Russian roulette is a game of chance in which one or more of the six chambers of a revolver are filled with cartridges, the chamber is rotated at random, and the gun is fired. The shooter bets on whether the chamber which rotates into place will be loaded. If it is, he loses not only his bet but his life. In the case of a revolver with six chambers (revolvers with 5, 7, or 8 chambers are also common), the shooter has a 1/6 chance of dying (ignoring the fact that the probability of firing the round is always somewhat less than for a -shot revolver because the mass of the round in the cylinder causes an imbalance, and the cylinder will tend to stop rotating with its heavy side at or close to the bottom, while the firing pin is opposite the top chamber).A modified version is considered by Blom et al. (1996) and Blom (1989). In this variant, the revolver is loaded with a single cartridge, and two duelists alternately spin the chamber and fire at themselves until one is killed...

Random closed set

A random closed set (RACS) in is a measurable function from a probability space into where is the collection of all closed subsets of and where denotes the sigma-algebra generated over the by setsfor all compact subsets .Originally, RACS were defined not on but in the more general setting of locally compact and separable (LCS) topological spaces (Baudin 1984) which may or may not be T2. In this case, the above definition is modified so that is defined to be the collection of closed subsets of some ambient LCS space (Molchanov 2005).Despite a number of apparent differences, one can show that multidimensional point processes are a special case of RACS when talking about (Baudin 1984).

Quantile function

Given a random variable with continuous and strictly monotonic probability density function , a quantile function assigns to each probability attained by the value for which . Symbolically,Defining quantile functions for discrete rather than continuous distributions requires a bit more work since the discrete nature of such a distribution means that there may be gaps between values in the domain of the distribution function and/or "plateaus" in its range. Therefore, one often defines the associated quantile function to bewhere denotes the range of .

Proofreading mistakes

If proofreader finds mistakes and proofreader finds mistakes, of which were also found by , how many mistakes were missed by both and ? Assume there are a total of mistakes, so proofreader finds a fraction of all mistakes, and also a fraction of the mistakes found by . Assuming these fractions are the same, then solving for givesThe number of mistakes missed by both is therefore approximately

Interval stationary point process

A point process on is said to be interval stationary if for every and for all integers , the joint distribution ofdoes not depend on , . Here, is an interval for all .As pointed out in a variety of literature (e.g., Daley and Vere-Jones 2002, pp 45-46), the notion of an interval stationary point process is intimately connected to (though fundamentally different from) the idea of a stationary point process in the Borel set sense of the term. Worth noting, too, is the difference between interval stationarity and other notions such as simple/crude stationarity.Though it has been done, it is more difficult to extend to the notion of interval stationarity; doing so requires a significant amount of additional machinery and reflects, overall, the significantly-increased structural complexity of higher-dimensional Euclidean spaces (Daley and Vere-Jones 2007)...

Probability space

A triple on the domain , where is a measurable space, are the measurable subsets of , and is a measure on with .

Intensity measure

The intensity measure of a point process relative to a Borel set is defined to be the expected number of points of falling in . Symbolically,where here, denotes the expected value.The notion of an intensity measure is intimately connected to one oft-discussed notionof intensity function (Pawlas 2008).

Probability measure

Consider a probability space specified by the triple , where is a measurable space, with the domain and is its measurable subsets, and is a measure on with . Then the measure is said to be a probability measure. Equivalently, is said to be normalized.

Intensity function

There are at least two distinct notions of an intensity function related to the theoryof point processes.In some literature, the intensity of a point process is defined to be the quantity(1)provided it exists. Here, denotes probability. In particular, it makes sense to talk about point processes having infinite intensity, though when finite, allows to be rewritten so that(2)as where here, denotes little-O notation (Daley and Vere-Jones 2007).Other authors define the function to be an intensity function of a point process provided that is a density of the intensity measure associated to relative to Lebesgue measure, i.e.,if for all Borel sets in ,(3)where denotes Lebesgue measure (Pawlas 2008).

Independent statistics

Two variates and are statistically independent iff the conditional probability of given satisfies(1)in which case the probability of and is just(2)If events , , ..., are independent, then(3)Statistically independent variables are always uncorrelated,but the converse is not necessarily true.

Bonferroni inequalities

Let be the probability that is true, and be the probability that at least one of , , ..., is true. Then "the" Bonferroni inequality, also known as Boole's inequality, states thatwhere denotes the union. If and are disjoint sets for all and , then the inequality becomes an equality. A beautiful theorem that expresses the exact relationship between the probability of unions and probabilities of individual events is known as the inclusion-exclusion principle.A slightly wider class of inequalities are also known as "Bonferroni inequalities."

Probability domain

Evans et al. (2000, p. 6) use the unfortunate term "probability domain" to refer to the range of the distribution function of a probability density function. For a continuous distribution, the probability domain is simply the interval , whereas for a discrete distribution, it is a subset of that interval.

Probability density function

The probability density function (PDF) of a continuous distribution is defined as the derivative of the (cumulative) distribution function ,(1)(2)(3)so(4)(5)A probability function satisfies(6)and is constrained by the normalization condition,(7)(8)Special cases are(9)(10)(11)(12)(13)To find the probability function in a set of transformed variables, find the Jacobian. For example, If , then(14)so(15)Similarly, if and , then(16)Given probability functions , , ..., , the sum distribution has probability function(17)where is a delta function. Similarly, the probability function for the distribution of is given by(18)The difference distribution has probability function(19)and the ratio distribution has probability function(20)Given the moments of a distribution (, , and the gamma statistics ), the asymptotic probability function is given by(21)where(22)is the normal distribution, and(23)for (with cumulants and..

Bayes' theorem

Let and be sets. Conditional probability requires that(1)where denotes intersection ("and"), and also that(2)Therefore,(3)Now, let(4)so is an event in and for , then(5)(6)But this can be written(7)so(8)(Papoulis 1984, pp. 38-39).

Cauchy distribution

The Cauchy distribution, also called the Lorentzian distribution or Lorentz distribution, is a continuous distribution describing resonance behavior. It also describes the distribution of horizontal distances at which a line segment tilted at a random angle cuts the x-axis.Let represent the angle that a line, with fixed point of rotation, makes with the vertical axis, as shown above. Then(1)(2)(3)(4)so the distribution of angle is given by(5)This is normalized over all angles, since(6)and(7)(8)(9)The general Cauchy distribution and its cumulative distribution can be written as(10)(11)where is the half width at half maximum and is the statistical median. In the illustration about, .The Cauchy distribution is implemented in the Wolfram Language as CauchyDistribution[m, Gamma/2].The characteristic function is(12)(13)The moments of the distribution are undefined since the integrals(14)diverge for .If and are variates with..


The word "class" has many specialized meanings in mathematics in which it refers to a group of objects with some common property (e.g., characteristic class or conjugacy class.)In statistics, a class is a grouping of values by which data is binned for computation of a frequency distribution (Kenney and Keeping 1962, p. 14). The range of values of a given class is called a class interval, the boundaries of an interval are called class limits, and the middle of a class interval is called the class mark.The following table summarizes the classes illustrated in the histogramabove for an example data set.class intervalclass markabsolute frequencyrelative frequencycumulative absolute frequencyrelative cumulative frequency0.00- 9.99510.0110.0110.00-19.991530.0340.0420.00-29.992580.08120.1230.00-39.9935180.18300.3040.00-49.9945240.24540.5450.00-59.9955220.22760.7660.00-69.9965150.15910.9170.00-79.997580.08990.9980.00-89.998500.00990.9990.00-99.999510.011001.00..


A sample is a subset of a population that is obtained through some process, possibly random selection or selection based on a certain set of criteria, for the purposes of investigating the properties of the underlying parent population. In particular, statistical quantities determined directly from the sample (such as sample central moments, sample raw moments, sample mean, sample variance, etc.) can be used as estimators for the corresponding properties of the underlying distribution.The process of obtaining a sample is known as sampling, and the number of members in a sample is called the sample size.

Lexis trials

sets of trials each, with the probability of success constant in each set.where is the variance of .

Lexis ratio

where is the variance in a set of Lexis trials and is the variance assuming Bernoulli trials. If , the trials are said to be subnormal, and if , the trials are said to be supernormal.


Trials for which the Lexis ratiosatisfies , where is the variance in a set of Lexis trials and is the variance assuming Bernoulli trials.

Poisson trials

A number of trials in which the probability of success varies from trial to trial. Let be the number of successes, then(1)where is the variance of and . Uspensky has shown that(2)where(3)(4)(5)(6)and . The probability that the number of successes is at least is given by(7)Uspensky gives the true probability that there are at least successes in trials as(8)where(9)(10)


An experiment is defined (Papoulis 1984, p. 30) as a mathematical object consisting of the following elements. 1. A set (the probability space) of elements. 2. A Borel field consisting of certain subsets of called events. 3. A number satisfying the probability axioms, called the probability, that is assigned to every event .

Sample proportion

Let there be successes out of Bernoulli trials. The sample proportion is the fraction of samples which were successes, so(1)For large , has an approximately normal distribution. Let RE be the relative error and SE the standard error, then(2)(3)(4)where CI is the confidence interval and is the erf function. The number of tries needed to determine with relative error RE and confidence interval CI is(5)


A run is a sequence of more than one consecutive identical outcomes, also known as a clump.Let be the probability that a run of or more consecutive heads appears in independent tosses of a coin (i.e., Bernoulli trials). This is equivalent to repeated picking from an urn containing two distinguishable objects with replacement after each pick. Let the probability of obtaining a head be . Then there is a beautiful formula for given in terms of the coefficients of the generating function(1)(Feller 1968, p. 300). Then(2)The following table gives the triangle of numbers for , 2, ... and , 2, ..., (OEIS A050227).SloaneA000225A008466A050231A050233123456781100000002310000003731000004158310000531198310006634320831007127944720831082552011074820831The special case gives the sequence(3)where is a Fibonacci number. Similarly, the probability that no consecutive tails will occur in tosses is given by , where is a Fibonacci k-step..

Trivariate normal distribution

A multivariate normal distribution in three variables. It has probability density function(1)where(2)The standardized trivariate normal distribution takes unit variances and . The quadrant probability in this special case is then given analytically by(3)(Rose and Smith 1996; Stuart and Ord 1998; Rose and Smith 2002, p. 231).

Multivariate normal distribution

A -variate multivariate normal distribution (also called a multinormal distribution) is a generalization of the bivariate normal distribution. The -multivariate distribution with mean vector and covariance matrix is denoted . The multivariate normal distribution is implemented as MultinormalDistribution[mu1, mu2, ..., sigma11, sigma12, ..., sigma12, sigma22, ..., ..., x1, x2, ...] in the Wolfram Language package MultivariateStatistics` (where the matrix must be symmetric since ).In the case of nonzero correlations, there is in general no closed-form solution for the distribution function of a multivariate normal distribution. As a result, such computations must be done numerically.

Cluster analysis

Cluster analysis is a technique used for classification of data in which data elements are partitioned into groups called clusters that represent collections of data elements that are proximate based on a distance or dissimilarity function.Cluster analysis is implemented as FindClusters[data] or FindClusters[data, n].The Season 1 pilot (2005) and Season 2 episode "Dark Matter" of the television crime drama NUMB3RS feature clusters and cluster analysis. In "Dark Matter," math genius Charlie Eppes runs a cluster analysis to find connections between the students that seemed to be systematically singled out by the anomalous third shooter. In Season 4 episode"Black Swan," characters Charles Eppes and Amita Ramanujan adjust cluster radii in their attempt to do a time series analysis of overlapping Voronoi regions to track the movements of a suspect. ..

Bivariate normal distribution

The bivariate normal distribution is the statistical distribution with probabilitydensity function(1)where(2)and(3)is the correlation of and (Kenney and Keeping 1951, pp. 92 and 202-205; Whittaker and Robinson 1967, p. 329) and is the covariance.The probability density function of the bivariate normal distribution is implemented as MultinormalDistribution[mu1, mu2, sigma11, sigma12, sigma12, sigma22] in the Wolfram Language package MultivariateStatistics` .The marginal probabilities are then(4)(5)and(6)(7)(Kenney and Keeping 1951, p. 202).Let and be two independent normal variates with means and for , 2. Then the variables and defined below are normal bivariates with unit variance and correlation coefficient :(8)(9)To derive the bivariate normal probability function, let and be normally and independently distributed variates with mean 0 and variance 1, then define(10)(11)(Kenney and Keeping..

Wiener numbers

A sequence of uncorrelated numbers developed by Wiener (1926-1927). The numbers are constructed by beginning with(1)then forming the outer product with to obtain(2)This row is repeated twice, and its outer product is then taken to give(3)This is then repeated four times. The procedure is repeated, and the result repeated eight times, and so on. The sequences from each stage are then concatenated to form the sequence 1, , 1, 1, 1, , , 1, , , 1, 1, 1, , , 1, , , ....


where is the entropy and is the joint entropy. Linear redundancy is defined aswhere are eigenvalues of the correlation matrix.


Predictability at a time in the future is defined byand linear predictability bywhere and are the redundancy and linear redundancy, and is the entropy.

Nonstationary time series

A time series , , ... is nonstationary if, for some , the joint probability distribution of , , ..., is dependent on the time index .

Statistical correlation

For two random variates and , the correlation is defined bY(1)where denotes standard deviation and is the covariance of these two variables. For the general case of variables and , where , 2, ..., ,(2)where are elements of the covariance matrix. In general, a correlation gives the strength of the relationship between variables. For ,(3)The variance of any quantity is always nonnegativeby definition, so(4)From a property of variances, the sum can be expanded(5)(6)(7)Therefore,(8)Similarly,(9)(10)(11)(12)Therefore,(13)so .For a linear combination of two variables,(14)(15)(16)(17)Examine the cases where ,(18)(19)The variance will be zero if , which requires that the argument of the variance is a constant. Therefore, , so . If , is either perfectly correlated () or perfectly anticorrelated () with ...

Least squares fitting--exponential

To fit a functional form(1)take the logarithm of both sides(2)The best-fit values are then(3)(4)where and .This fit gives greater weights to small values so, in order to weight the points equally, it is often better to minimize the function(5)Applying least squares fitting gives(6)(7)(8)Solving for and ,(9)(10)In the plot above, the short-dashed curve is the fit computed from (◇) and (◇) and the long-dashed curve is the fit computed from (9) and (10).

Least squares fitting

A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand.In practice, the vertical offsets from a line (polynomial, surface, hyperplane, etc.) are almost always minimized instead of the perpendicular offsets. This provides a fitting function for the independent variable that estimates for a given (most often what an experimenter wants), allows uncertainties of the data points along the - and -axes to be incorporated simply, and also provides a much..

Sheppard's correction

A correction which must be applied to the measured moments obtained from normally distributed data which have been binned in order to obtain correct estimators for the population moments . The corrected versions of the second, third, and fourth moments are then(1)(2)(3)where is the class interval.If is the th cumulant of an ungrouped distribution and the th cumulant of the grouped distribution with class interval , the corrected cumulants (under rather restrictive conditions) are(4)where is the th Bernoulli number, giving(5)(6)(7)(8)(9)(10)For a proof, see Kendall et al. (1998).


Let be the characteristic function, defined as the Fourier transform of the probability density function using Fourier transform parameters ,(1)(2)The cumulants are then defined by(3)(Abramowitz and Stegun 1972, p. 928). Taking the Maclaurinseries gives(4)where are raw moments, so(5)(6)(7)(8)(9)These transformations can be given by CumulantToRaw[n] in the Mathematica application package mathStatica.In terms of the central moments ,(10)(11)(12)(13)(14)where is the mean and is the variance. These transformations can be given by CumulantToCentral[n].Multivariate cumulants can be expressed in terms of raw moments, e.g.,(15)(16)and central moments, e.g.,(17)(18)(19)(20)(21)using CumulantToRaw[m, n, ...] and CumulantToCentral[m, n, ...], respectively.The k-statistics are unbiasedestimators of the cumulants...

Sample variance distribution

Let samples be taken from a population with central moments . The sample variance is then given by(1)where is the sample mean.The expected value of for a sample size is then given by(2)Similarly, the expected variance of the sample varianceis given by(3)(4)(Kenney and Keeping 1951, p. 164; Rose and Smith 2002, p. 264).The algebra of deriving equation (4) by hand is rather tedious,but can be performed as follows. Begin by noting that(5)so(6)The value of is already known from equation (◇), so it remains only to find . The algebra is simplified considerably by immediately transforming variables to and performing computations with respect to these central variables. Since the variance does not depend on the mean of the underlying distribution, the result obtained using the transformed variables will give an identical result while immediately eliminating expectation values of sums of terms containing odd powers of (which..

Moment problem

The moment problem, also called "Hausdorff's moment problem" or the "little moment problem," may be stated as follows. Given a sequence of numbers , under what conditions is it possible to determine a function of bounded variation in the interval such thatfor , 1, .... Such a sequence is called a moment sequence, and Hausdorff (1921ab) was the first to obtain necessary and sufficient conditions for a sequence to be a moment sequence.


Covariance provides a measure of the strength of the correlation between two or more sets of random variates. The covariance for two random variates and , each with sample size , is defined by the expectation value(1)(2)where and are the respective means, which can be written out explicitly as(3)For uncorrelated variates,(4)so the covariance is zero. However, if the variables are correlated in some way, then their covariance will be nonzero. In fact, if , then tends to increase as increases, and if , then tends to decrease as increases. Note that while statistically independent variables are always uncorrelated, the converse is not necessarily true.In the special case of ,(5)(6)so the covariance reduces to the usual variance . This motivates the use of the symbol , which then provides a consistent way of denoting the variance as , where is the standard deviation.The derived quantity(7)(8)is called statistical correlation of and .The covariance..

Sample variance computation

When computing the sample variance numerically, the mean must be computed before can be determined. This requires storing the set of sample values. However, it is possible to calculate using a recursion relationship involving only the last sample as follows. This means itself need not be precomputed, and only a running set of values need be stored at each step.In the following, use the somewhat less than optimal notation to denote calculated from the first samples (i.e., not the th moment)(1)and let denotes the value for the bias-corrected sample variance calculated from the first samples. The first few values calculated for the mean are(2)(3)(4)Therefore, for , 3 it is true that(5)Therefore, by induction,(6)(7)(8)(9)By the definition of the sample variance,(10)for . Defining , can then be computed using the recurrence equation(11)(12)(13)(14)Working on the first term,(15)(16)Use (◇) to write(17)so(18)Now work on the second..

Charlier's check

A check which can be used to verify correct computations in a table of grouped classes. For example, consider the following table with specified class limits and frequencies . The class marks are then computed as well as the rescaled frequencies , which are given by(1)where the class mark is taken as and the class interval is . The remaining quantities are then computed as follows.class limits30-3934.52321840-4944.53271250-5954.511441160-6964.52020070-7974.5320003280-8984.5251252510090-9994.572142863total100176236In order to compute the variance, note that(2)(3)(4)so the variance of the original data is(5)Charlier's check makes use of the additional column added to the right side of the table. By noting that the identity(6)(7)connects columns five through seven, it can be checked that the computations have been done correctly. In the example above,(8)so the computations pass Charlier's check...


The th raw moment (i.e., moment about zero) of a distribution is defined by(1)where(2), the mean, is usually simply denoted . If the moment is instead taken about a point ,(3)A statistical distribution is not uniquely specified by its moments, although it is by its characteristic function.The moments are most commonly taken about the mean. These so-called central moments are denoted and are defined by(4)(5)with . The second moment about the mean is equal to the variance(6)where is called the standard deviation.The related characteristic function isdefined by(7)(8)The moments may be simply computed using the moment-generatingfunction,(9)

Sample raw moment

The th sample raw moment of a sample with sample size is defined as(1)The sample raw moments are unbiased estimators of the population rawmoments,(2)(Rose and Smith 2002, p. 253). The sample raw moment is related to power sums by(3)This relationship can be given by SampleRawToPowerSum[r] in the Mathematica application package mathStatica.

Central moment

A moment of a univariate probability density function taken about the mean ,(1)(2)where denotes the expectation value. The central moments can be expressed as terms of the raw moments (i.e., those taken about zero) using the binomial transform(3)with (Papoulis 1984, p. 146). The first few central moments expressed in terms of the raw moments are therefore(4)(5)(6)(7)(8)These transformations can be obtained using CentralToRaw[n] in the Mathematica application package mathStatica.The central moments can also be expressed in terms of the cumulants , with the first few cases given by(9)(10)(11)(12)These transformations can be obtained using CentralToCumulant[n] in the Mathematica application package mathStatica.The central moment of a multivariate probability density function can be similarly defined as(13)Therefore,(14)For example,(15)(16)Similarly, the multivariate central moments can be expressed in terms..

Bessel's correction

Bessel's correction is the factor in the relationship between the variance and the expectation values of the sample variance,(1)where(2)As noted by Kenney and Keeping (1951, p. 161), the correction factor is probably more properly attributed to Gauss, who used it in this connection as early as 1823 (Gauss 1823).For two samples,(3)(Kenney and Keeping 1951, p. 162).

Hypothesis testing

Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps.1. Formulate the null hypothesis (commonly, that the observations are the result of pure chance) and the alternative hypothesis (commonly, that the observations show a real effect combined with a component of chance variation). 2. Identify a test statistic that can be used toassess the truth of the null hypothesis. 3. Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the -value, the stronger the evidence against the null hypothesis. 4. Compare the -value to an acceptable significance value (sometimes called an alpha value). If , that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis..

Dot plot

A dot plot, also called a dot chart, is a type of simple histogram-like chart used in statistics for relatively small data sets where values fall into a number of discrete bins. To draw a dot plot, count the number of data points falling in each bin and draw a stack of dots that number high for each bin. The illustration above shows such a plot for a random sample of 100 integers chosen between 1 and 25 inclusively.Simple code for drawing a dot plot in the WolframLanguage with some appropriate labeling of bin heights can be given asDotPlot[data_] := Module[{m = Tally[Sort[data]]}, ListPlot[Flatten[Table[{#1, n}, {n, #2}]& @@@ m, 1], Ticks -> {Automatic, Range[0, Max[m[[All, 2]]]]}]]

Arbitrary precision

In most computer programs and computing environments, the precision of any calculation (even including addition) is limited by the word size of the computer, that is, by largest number that can be stored in one of the processor's registers. As of mid-2002, the most common processor word size is 32 bits, corresponding to the integer . General integer arithmetic on a 32-bit machine therefore allows addition of two 32-bit numbers to get 33 bits (one word plus an overflow bit), multiplication of two 32-bit numbers to get 64 bits (although the most prevalent programming language, C, cannot access the higher word directly and depends on the programmer to either create a machine language function or write a much slower function in C at a final overhead of about nine multiplies more), and division of a 64-bit number by a 32-bit number creating a 32-bit quotient and a 32-bit remainder/modulus.Arbitrary-precision arithmetic consists of a set of algorithms,..

Quantum stochastic calculus

Let , , be one-dimensional Brownian motion. Integration with respect to was defined by Itô (1951). A basic result of the theory is that stochastic integral equations of the form(1)can be interpreted as stochastic differential equations of the form(2)where differentials are handled with the use of Itô's formula(3)(4)Hudson and Parthasarathy (1984) obtained a Fock space representation of Brownian motion and Poisson processes. The boson Fock space over is the Hilbert space completion of the linear span of the exponential vectors under the inner product(5)where and and is the complex conjugate of .The annihilation, creation and conservation operators , and respectively, are defined on the exponential vectors of as follows,(6)(7)(8)The basic quantum stochastic differentials , , and are defined as follows,(9)(10)(11)Hudson and Parthasarathy (1984) defined stochastic integration with respect to the noise differentials..

Correlation ratio

Let there be observations of the th phenomenon, where , ..., and(1)(2)(3)Then the sample correlation ratio is defined by(4)Let be the population correlation ratio. If for , then(5)where(6)(7)(8)and is the confluent hypergeometric limit function. If , then(9)(Kenney and Keeping 1951, pp. 323-324).

Normal equation

Given a matrix equationthe normal equation is that which minimizes the sum of the square differences between the left and right sides:It is called a normal equation because is normal to the range of .Here, is a normal matrix.

Least squares fitting--power law

Given a function of the form(1)least squares fitting gives the coefficientsas(2)(3)where and .

Correlation coefficient--bivariate normal distribution

For a bivariate normal distribution, the distribution of correlation coefficients is given by(1)(2)(3)where is the population correlation coefficient, is a hypergeometric function, and is the gamma function (Kenney and Keeping 1951, pp. 217-221). The moments are(4)(5)(6)(7)where . If the variates are uncorrelated, then and(8)(9)so(10)(11)But from the Legendre duplication formula,(12)so(13)(14)(15)(16)The uncorrelated case can be derived more simply by letting be the true slope, so that . Then(17)is distributed as Student's t with degrees of freedom. Let the population regression coefficient be 0, then , so(18)and the distribution is(19)Plugging in for and using(20)(21)(22)gives(23)(24)(25)(26)so(27)as before. See Bevington (1969, pp. 122-123) or Pugh and Winslow (1966, §12-8). If we are interested instead in the probability that a correlation coefficient would be obtained , where is the observed..

Nonlinear least squares fitting

Given a function of a variable tabulated at values , ..., , assume the function is of known analytic form depending on parameters , and consider the overdetermined set of equations(1)(2)We desire to solve these equations to obtain the values , ..., which best satisfy this system of equations. Pick an initial guess for the and then define(3)Now obtain a linearized estimate for the changes needed to reduce to 0,(4)for , ..., , where . This can be written in component form as(5)where is the matrix(6)In more concise matrix form,(7)where is an -vector and is an -vector.Applying the transpose of to both sides gives(8)Defining(9)(10)in terms of the known quantities and then gives the matrix equation(11)which can be solved for using standard matrix techniques such as Gaussian elimination. This offset is then applied to and a new is calculated. By iteratively applying this procedure until the elements of become smaller than some prescribed limit, a solution..

Least squares fitting--polynomial

Generalizing from a straight line (i.e., first degree polynomial) to a th degree polynomial(1)the residual is given by(2)The partial derivatives (again dropping superscripts)are(3)(4)(5)These lead to the equations(6)(7)(8)or, in matrix form(9)This is a Vandermonde matrix. We can also obtainthe matrix for a least squares fit by writing(10)Premultiplying both sides by the transpose of the firstmatrix then gives(11)so(12)As before, given points and fitting with polynomial coefficients , ..., gives(13)In matrix notation, the equation for a polynomial fitis given by(14)This can be solved by premultiplying by the transpose ,(15)This matrix equation can be solved numerically,or can be inverted directly if it is well formed, to yield the solution vector(16)Setting in the above equations reproduces the linear solution...

Correlation coefficient

The correlation coefficient, sometimes also called the cross-correlation coefficient, Pearson correlation coefficient (PCC), Pearson's , the Perason product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a quantity that gives the quality of a least squares fitting to the original data. To define the correlation coefficient, first consider the sum of squared values , , and of a set of data points about their respective means,(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)These quantities are simply unnormalized forms of the variances and covariance of and given by(13)(14)(15)For linear least squares fitting, the coefficient in(16)is given by(17)(18)and the coefficient in(19)is given by(20)The correlation coefficient (sometimes also denoted ) is then defined by(21)(22)The correlation coefficient is also known as the product-moment coefficient of correlation or Pearson's correlation. The correlation..

Least squares fitting--perpendicular offsets

In practice, the vertical offsets from a line (polynomial, surface, hyperplane, etc.) are almost always minimized instead of the perpendicular offsets. This provides a fitting function for the independent variable that estimates for a given (most often what an experimenter wants), allows uncertainties of the data points along the - and -axes to be incorporated simply, and also provides a much simpler analytic form for the fitting parameters than would be obtained using a fit based on perpendicular offsets.The residuals of the best-fit line for a set of points using unsquared perpendicular distances of points are given by(1)Since the perpendicular distance from a line to point is given by(2)the function to be minimized is(3)Unfortunately, because the absolute value function does not have continuous derivatives, minimizing is not amenable to analytic solution. However, if the square of the perpendicular distances(4)is minimized instead,..

Least squares fitting--logarithmic

Given a function of the form(1)the coefficients can be found from leastsquares fitting as(2)(3)

Raw moment

A moment of a probability function taken about 0,(1)(2)The raw moments (sometimes also called "crude moments") can be expressed as terms of the central moments (i.e., those taken about the mean ) using the inverse binomial transform(3)with and (Papoulis 1984, p. 146). The first few values are therefore(4)(5)(6)(7)The raw moments can also be expressed in terms of the cumulants by exponentiating both sides of the series(8)where is the characteristic function, to obtain(9)The first few terms are then given by(10)(11)(12)(13)(14)These transformations can be obtained using RawToCumulant[n] in the Mathematica application package mathStatica.The raw moment of a multivariate probability function can be similarly defined as(15)Therefore,(16)The multivariate raw moments can be expressed in terms of the multivariate cumulants. For example,(17)(18)These transformations can be obtained using RawToCumulant[m,..

Kendall operator

The operator that can be used to derive multivariate formulas for moments and cumulants from corresponding univariate formulas.For example, to derive the expression for the multivariate central moments in terms of multivariate cumulants, begin with(1)Now rewrite each variable as to obtain(2)Now differentiate each side with respect to , where(3)and wherever there is a term with a derivative , remove the derivative and replace the argument with times itself, so(4)Now set any s appearing as coefficients to 1, so(5)Dividing through by 4 gives(6)Finally, set any coefficients powers of appearing as term coefficients to 1 and interpret the resulting terms as , so that the above gives(7)This procedure can be repeated up to times, where is the subscript of the univariate case.Iterating the above procedure gives(8)(9)(10)(11)(12)giving the identities(13)(14)(15)(16)(17)..

Variation coefficient

If is the standard deviation of a set of samples and their mean, then the variation coefficient is defined as

Check the price
for your project