# Probability and statistics

79 345 subscribers already with us

## Probability and statistics Topics

Sort by:

### Lyapunov condition

The Lyapunov condition, sometimes known as Lyapunov's central limit theorem, states that if the th moment (with ) exists for a statistical distribution of independent random variates (which need not necessarily be from same distribution), the means and variances are finite, and(1)then if(2)where(3)the central limit theorem holds.

### Lindeberg condition

A sufficient condition on the Lindeberg-Feller central limit theorem. Given random variates , , ..., let , the variance of be finite, and variance of the distribution consisting of a sum of s(1)be(2)In the terminology of Zabell (1995), let(3)where denotes the expectation value of restricted to outcomes , then the Lindeberg condition is(4)for all (Zabell 1995).In the terminology of Feller (1971), the Lindeberg condition assumed that for each ,(5)or equivalently(6)Then the distribution(7)tends to the normal distribution with zero expectation and unit variance (Feller 1971, p. 256). The Lindeberg condition (5) guarantees that the individual variances are small compared to their sum in the sense that for given for all sufficiently large , for , ..., (Feller 1971, p. 256).

### Maximum likelihood

Maximum likelihood, also called the maximum likelihood method, is the procedure of finding the value of one or more parameters for a given statistic which makes the known likelihood distribution a maximum. The maximum likelihood estimate for a parameter is denoted .For a Bernoulli distribution,(1)so maximum likelihood occurs for . If is not known ahead of time, the likelihood function is(2)(3)(4)where or 1, and , ..., .(5)(6)Rearranging gives(7)so(8)For a normal distribution,(9)(10)so(11)and(12)giving(13)Similarly,(14)gives(15)Note that in this case, the maximum likelihood standard deviation is the sample standard deviation, which is a biased estimator for the population standard deviation.For a weighted normal distribution,(16)(17)(18)gives(19)The variance of the mean isthen(20)But(21)so(22)(23)(24)For a Poisson distribution,(25)(26)(27)(28)..

### Estimator bias

The bias of an estimator is defined as(1)It is therefore true that(2)(3)An estimator for which is said to be unbiased estimator.

### Estimator

An estimator is a rule that tells how to calculate an estimate based on the measurements contained in a sample. For example, the sample mean is an estimator for the population mean .The mean square error of an estimator is defined by(1)Let be the estimator bias, then(2)(3)(4)where is the estimator variance.

### Walsh index

The statistical indexwhere is the price per unit in period and is the quantity produced in period .

### Paasche's index

The statistical indexwhere is the price per unit in period and is the quantity produced in period .

### Mitchell index

The statistical indexwhere is the price per unit in period and is the quantity produced in period .

### Laspeyres' index

The statistical indexwhere is the price per unit in period and is the quantity produced in the initial period.

### Index number

A statistic which assigns a single number to several individual statistics in order to quantify trends. The best-known index in the United States is the consumer price index, which gives a sort of "average" value for inflation based on price changes for a group of selected products. The Dow Jones and NASDAQ indexes for the New York and American Stock Exchanges, respectively, are also index numbers.Let be the price per unit in period , be the quantity produced in period , and be the value of the units. Let be the estimated relative importance of a product. There are several types of indices defined, among them those listed in the following table. indexabbr.formulaBowley indexFisher indexgeometric mean indexharmonic mean indexLaspeyres' indexmarshall-Edgeworth indexmitchell indexPaasche's indexWalsh index..

### Harmonic mean index

The statistical indexwhere is the price per unit in period , is the quantity produced in period , and the value of the units, and subscripts 0 indicate the reference year.

### Geometric mean index

The statistical indexwhere is the price per unit in period , is the quantity produced in period , and the value of the units.

### Joint distribution function

A joint distribution function is a distribution function in two variables defined by(1)(2)(3)so that the joint probability function satisfies(4)(5)(6)(7)(8)Two random variables and are independent iff(9)for all and and(10)A multiple distribution function is of the form(11)

### Distribution function

The distribution function , also called the cumulative distribution function (CDF) or cumulative frequency function, describes the probability that a variate takes on a value less than or equal to a number . The distribution function is sometimes also denoted (Evans et al. 2000, p. 6).The distribution function is therefore related to a continuous probability density function by(1)(2)so (when it exists) is simply the derivative of the distribution function(3)Similarly, the distribution function is related to a discrete probability by(4)(5)There exist distributions that are neither continuous nor discrete.A joint distribution function can bedefined if outcomes are dependent on two parameters:(6)(7)(8)Similarly, a multivariate distribution function can be defined if outcomes depend on parameters:(9)The probability content of a closed region can be found much more efficiently than by direct integration of the probability..

### Mean distribution

For an infinite population with mean , variance , skewness , and kurtosis excess , the corresponding quantities for the distribution of means are(1)(2)(3)(4)For a population of (Kenney and Keeping 1962, p. 181),(5)(6)

### Discrete distribution

A statistical distribution whose variables can take on only discrete values. Abramowitz and Stegun (1972, p. 929) give a table of the parameters of most common discrete distributions.A discrete distribution with probability function defined over , 2, ..., has distribution functionand population mean

### Multinomial distribution

Let a set of random variates , , ..., have a probability function(1)where are nonnegative integers such that(2)and are constants with and(3)Then the joint distribution of , ..., is a multinomial distribution and is given by the corresponding coefficient of the multinomial series(4)In the words, if , , ..., are mutually exclusive events with , ..., . Then the probability that occurs times, ..., occurs times is given by(5)(Papoulis 1984, p. 75).The mean and variance of are(6)(7)The covariance of and is(8)

### Normal ratio distribution

The ratio of independent normally distributed variates with zero mean is distributed with a Cauchy distribution. This can be seen as follows. Let and both have mean 0 and standard deviations of and , respectively, then the joint probability density function is the bivariate normal distribution with ,(1)From ratio distribution, the distribution of is(2)(3)(4)But(5)so(6)(7)(8)which is a Cauchy distribution.A more direct derivative proceeds from integration of(9)(10)where is a delta function.

### Normal product distribution

The distribution of a product of two normally distributed variates and with zero means and variances and is given by(1)(2)where is a delta function and is a modified Bessel function of the second kind. This distribution is plotted above in red.The analogous expression for a product of three normal variates can be given in termsof Meijer G-functions as(3)plotted above in blue.

### Normal distribution function

A normalized form of the cumulative normal distribution function giving the probability that a variate assumes a value in the range ,(1)It is related to the probability integral(2)by(3)Let so . Then(4)Here, erf is a function sometimes called the error function. The probability that a normal variate assumes a value in the range is therefore given by(5)Neither nor erf can be expressed in terms of finite additions, subtractions, multiplications, and root extractions, and so must be either computed numerically or otherwise approximated.Note that a function different from is sometimes defined as "the" normal distribution function(6)(7)(8)(9)(Feller 1968; Beyer 1987, p. 551), although this function is less widely encountered than the usual . The notation is due to Feller (1971).The value of for which falls within the interval with a given probability is a related quantity called the confidence interval.For small values..

### Von mises distribution

A continuous distribution defined on the range with probability density function(1)where is a modified Bessel function of the first kind of order 0, and distribution function(2)which cannot be done in closed form. Here, is the mean direction and is a concentration parameter. The von Mises distribution is the circular analog of the normal distribution on a line.The mean is(3)and the circular variance is(4)

### Normal difference distribution

Amazingly, the distribution of a difference of two normally distributed variates and with means and variances and , respectively, is given by(1)(2)where is a delta function, which is another normal distribution having mean(3)and variance(4)

### Uniform sum distribution

The distribution for the sum of uniform variates on the interval can be found directly as(1)where is a delta function.A more elegant approach uses the characteristicfunction to obtain(2)where the Fourier parameters are taken as . The first few values of are then given by(3)(4)(5)(6)illustrated above.Interestingly, the expected number of picks of a number from a uniform distribution on so that the sum exceeds 1 is e (Derbyshire 2004, pp. 366-367). This can be demonstrated by noting that the probability of the sum of variates being greater than 1 while the sum of variates being less than 1 is(7)(8)(9)The values for , 2, ... are 0, 1/2, 1/3, 1/8, 1/30, 1/144, 1/840, 1/5760, 1/45360, ... (OEIS A001048). The expected number of picks needed to first exceed 1 is then simply(10)It is more complicated to compute the expected number of picks that is needed for their sum to first exceed 2. In this case,(11)(12)The first few terms are therefore 0, 0, 1/6,..

### Uniform ratio distribution

The ratio of uniform variates and on the interval can be found directly as(1)(2)where is a delta function and is the Heaviside step function.The distribution is normalized, but its mean and moments diverge.

### Uniform product distribution

The distribution of the product of uniform variates on the interval can be found directly as(1)(2)where is a delta function. The distributions are plotted above for (red), (yellow), and so on.

### Uniform difference distribution

The difference of two uniform variates on the interval can be found as(1)(2)where is a delta function and is the Heaviside step function.

### Error function distribution

A normal distribution with mean0,(1)The characteristic function is(2)The mean, variance, skewness,and kurtosis excess are(3)(4)(5)(6)The cumulants are(7)(8)(9)for .

### Erlang distribution

Given a Poisson distribution with a rate of change , the distribution function giving the waiting times until the th Poisson event is(1)(2)for , where is a complete gamma function, and an incomplete gamma function. With explicitly an integer, this distribution is known as the Erlang distribution, and has probability function(3)It is closely related to the gamma distribution, which is obtained by letting (not necessarily an integer) and defining . When , it simplifies to the exponential distribution.Evans et al. (2000, p. 71) write the distribution using the variables and .

### Doob's theorem

A theorem proved by Doob (1942) which states that any random process which is both normal and Markov has the following forms for its correlation function , spectral density , and probability densities and :(1)(2)(3)(4)where is the mean, the standard deviation, and the relaxation time.

### Difference of successes

If and are the observed proportions from standard normally distributed samples with proportion of success , then the probability that(1)will be as great as observed is(2)where(3)(4)(5)Here, is the unbiased estimator. The skewness and kurtosis excess of this distribution are(6)(7)

### Standard normal distribution

A standard normal distribution is a normal distribution with zero mean () and unit variance (), given by the probability density function and distribution function(1)(2)over the domain .It has mean, variance, skewness,and kurtosis excess given by(3)(4)(5)(6)The first quartile of the standard normal distribution occurs when , which is(7)(8)(OEIS A092678; Kenney and Keeping 1962, p. 134), where is the inverse erf function. The absolute value of this is known as the probable error.

### Logarithmic distribution

The logarithmic distribution is a continuous distribution for a variate with probability function(1)and distribution function(2)It therefore applies to a variable distributed as , and has appropriate normalization.Note that the log-series distribution is sometimes also known as the logarithmic distribution, and the distribution arising in Benford's law is also "a" logarithmic distribution.The raw moments are given by(3)The mean is therefore(4)The variance, skewness,and kurtosis excess are slightly complicated expressions.

### S distribution

The distribution is defined in terms of its distribution function as the solution to the initial value problemwhere (Savageau 1982, Aksenov and Savageau 2001). It has four free parameters: , , , and .The distribution is capable of approximating many central and noncentral unimodal univariate distributions rather well (Voit 1991), but also includes the exponential, logistic, uniform and linear distributions as special cases. The S distribution derives its name from the fact that it is based on the theory of S-systems (Savageau 1976, Voit 1991, Aksenov and Savageau 2001).

### Continuity correction

A correction to a discrete binomial distributionto approximate a continuous distribution.whereis a continuous variate with a normal distribution and is a variate of a binomial distribution.

### Probable error

The probability that a random sample from an infinite normally distributed universe will have a mean within a distance of the mean of the universe is(1)where is the normal distribution function and is the observed value of(2)The probable error is then defined as the value of such that , i.e.,(3)which is given by(4)(5)(OEIS A092678; Kenney and Keeping 1962, p. 134). Here, is the inverse erf function. The probability of a deviation from the true population value at least as great as the probable error is therefore 1/2.

### Price's theorem

Consider a bivariate normal distribution in variables and with covariance(1)and an arbitrary function . Then the expected value of the random variable (2)satisfies(3)

### Gibrat's distribution

Gibrat's distribution is a continuous distribution in which the logarithm of a variable has a normal distribution,(1)defined over the interval . It is a special case of the log normal distribution(2)with and , and so has distribution function(3)The mean, variance, skewness,and kurtosis excess are then given by(4)(5)(6)(7)

### Pearson type iii distribution

A skewed distribution which is similar to the binomial distribution when (Abramowitz and Stegun 1972, p. 930).(1)for where(2)(3) is the gamma function, and is a standardized variate. Another form is(4)For this distribution, the characteristicfunction is(5)and the mean, variance, skewness, and kurtosis excess are(6)(7)(8)(9)

### Pearson system

A system of equation types obtained by generalizing the differential equation forthe normal distribution(1)which has solution(2)to(3)which has solution(4)Let , be the roots of . Then the possible types of curves are 0. , . E.g., normal distribution. I. , . E.g., beta distribution. II. , , where . III. , , where . E.g., gamma distribution. This case is intermediate to cases I and VI. IV. , . V. , where . Intermediate to cases IV and VI. VI. , where is the larger root. E.g., beta prime distribution. VII. , , . E.g., Student's t-distribution. Classes IX-XII are discussed in Pearson (1916). See also Craig (in Kenney and Keeping 1951).If a Pearson curve possesses a mode, it will be at . Let at and , where these may be or . If also vanishes at , , then the th moment and th moments exist.(5)giving(6)(7)Now define the raw th moment by(8)so combining (7) with (8) gives(9)For ,(10)so(11)and for ,(12)so(13)Combining (11), (13), and the definitions(14)(15)obtained..

### Gaussian joint variable theorem

The Gaussian joint variable theorem, also called the multivariate theorem, states that given an even number of variates from a normal distribution with means all 0,(1)etc. Given an odd number of variates,(2)(3)etc.

### Beta prime distribution

A distribution with probability functionwhere is a beta function. The mode of a variate distributed as isIf is a variate, then is a variate. If is a variate, then and are and variates. If and are and variates, then is a variate. If and are variates, then is a variate.

### Normal sum distribution

Amazingly, the distribution of a sum of two normally distributed independent variates and with means and variances and , respectively is another normal distribution(1)which has mean(2)and variance(3)By induction, analogous results hold for the sum of normally distributed variates.An alternate derivation proceeds by noting that(4)(5)where is the characteristic function and is the inverse Fourier transform, taken with parameters .More generally, if is normally distributed with mean and variance , then a linear function of ,(6)is also normally distributed. The new distribution has mean and variance , as can be derived using the moment-generating function(7)(8)(9)(10)(11)which is of the standard form with(12)(13)For a weighted sum of independent variables(14)the expectation is given by(15)(16)(17)(18)(19)Setting this equal to(20)gives(21)(22)Therefore, the mean and variance of the weighted sums of random variables..

### Nested hypothesis

Let be the set of all possibilities that satisfy hypothesis , and let be the set of all possibilities that satisfy hypothesis . Then is a nested hypothesis within iff , where denotes the proper subset.

### Bonferroni correction

The Bonferroni correction is a multiple-comparison correction used when several dependent or independent statistical tests are being performed simultaneously (since while a given alpha value may be appropriate for each individual comparison, it is not for the set of all comparisons). In order to avoid a lot of spurious positives, the alpha value needs to be lowered to account for the number of comparisons being performed.The simplest and most conservative approach is the Bonferroni correction, which sets the alpha value for the entire set of comparisons equal to by taking the alpha value for each comparison equal to . Explicitly, given tests for hypotheses () under the assumption that all hypotheses are false, and if the individual test critical values are , then the experiment-wide critical value is . In equation form, iffor , thenwhich follows from the Bonferroni inequalities...

### Bessel's statistical formula

Let and be the observed mean and variance of a sample of drawn from a normal universe with unknown mean and let and be the observed mean and variance of a sample of drawn from a normal universe with unknown mean . Assume the two universes have a common variance , and define(1)(2)(3)Then(4)is distributed as Student's t-distribution with .

### Significance

Let . A value such that is considered "significant" (i.e., is not simply due to chance) is known as an alpha value. The probability that a variate would assume a value greater than or equal to the observed value strictly by chance, , is known as a P-value.Depending on the type of data and conventional practices of a given field of study, a variety of different alpha values may be used. One commonly used terminology takes as "not significant," , as "significant" (sometimes denoted *), and as "highly significant" (sometimes denoted **). Some authors use the term "almost significant" to refer to , although this practice is not recommended.

### Weibull distribution

The Weibull distribution is given by(1)(2)for , and is implemented in the Wolfram Language as WeibullDistribution[alpha, beta]. The raw moments of the distribution are(3)(4)(5)(6)and the mean, variance, skewness, and kurtosis excess of are(7)(8)(9)(10)where is the gamma function and(11)A slightly different form of the distribution is defined by(12)(13)(Mendenhall and Sincich 1995). This has raw moments(14)(15)(16)(17)so the mean and variance forthis form are(18)(19)The Weibull distribution gives the distribution of lifetimes of objects. It was originally proposed to quantify fatigue data, but it is also used in analysis of systems involving a "weakest link."

### Uncorrelated

Variables and are said to be uncorrelated if their covariance is zero:Independent statistics are always uncorrelated,but the converse is not necessarily true.

### Gamma statistic

where are cumulants and is the standard deviation.

### Robbin's inequality

If the fourth moment , thenwhere is the variance.

### Relative deviation

Let denote the mean of a set of quantities , then the relative deviation is defined by

### Absolute deviation

Let denote the mean of a set of quantities , then the absolute deviation is defined by

### Fisher's exact test

Fisher's exact test is a statistical test used to determine if there are nonrandom associations between two categorical variables.Let there exist two such variables and , with and observed states, respectively. Now form an matrix in which the entries represent the number of observations in which and . Calculate the row and column sums and , respectively, and the total sum(1)of the matrix. Then calculate the conditional probability of getting the actual matrix given the particular row and column sums, given by(2)which is a multivariate generalization of the hypergeometric probability function. Now find all possible matrices of nonnegative integers consistent with the row and column sums and . For each one, calculate the associated conditional probability using (2), where the sum of these probabilities must be 1.To compute the P-value of the test, the tables must then be ordered by some criterion that measures dependence, and those tables..

### Statistical test

A test used to determine the statistical significanceof an observation. Two main types of error can occur: 1. A type I error occurs when a false negative result is obtained in terms of the null hypothesis by obtaining a false positive measurement. 2. A type II error occurs when a false positive result is obtained in terms of the null hypothesis by obtaining a false negative measurement. The probability that a statistical test will be positive for a true statistic is sometimes called the test's sensitivity, and the probability that a test will be negative for a negative statistic is sometimes called the specificity. The following table summarizes the names given to the various combinations of the actual state of affairs and observed test results.resultnametrue positive resultsensitivityfalse negative result1-sensitivitytrue negative resultspecificityfalse positive result1-specificityMultiple-comparison corrections to statistical..

### Estimate

An estimate is an educated guess for an unknown quantity or outcome based on known information. The making of estimates is an important part of statistics, since care is needed to provide as accurate an estimate as possible using as little input data as possible. Often, an estimate for the uncertainty of an estimate can also be determined statistically. A rule that tells how to calculate an estimate based on the measurements contained in a sample is called an estimator.

### Total probability theorem

Given mutually exclusive events , ..., whose probabilities sum to unity, thenwhere is an arbitrary event, and is the conditional probability of assuming .

### Temporal point process

A temporal point process is a random process whose realizations consist of the times of isolated events.Note that in some literature, the values are assumed to be arbitrary real numbers while the index set is assumed to be the set of integers (Schoenberg 2002); on the other hand, some authors view temporal point processes as binary events so that takes values in a two-element set for each , and further assume that the index set is some finite set of points (Liam 2013). The prior perspective corresponds to viewing temporal point processes as how long events occur where the events themselves are spaced according to a discrete set of time parameters; the latter view corresponds to viewing temporal point processes as indications of whether or not a finite number of events has occurred.The behavior of a simple temporal point process is typically modeled by specifying its conditional intensity . Indeed, a number of specific examples of temporal point..

### Tail probability

Define as the set of all points with probabilities such that or , where is a point probability (often, the likelihood of an observed event). Then the associated tail probability is given by .

### Point process

A point process is a probabilistic model for random scatterings of points on some space often assumed to be a subset of for some . Oftentimes, point processes describe the occurrence over time of random events in which the occurrences are revealed one-by-one as time evolves; in this case, any collectionof occurrences is said to be a realization of the point process.Poisson processes are regarded as archetypal examplesof point processes (Daley and Vere-Jones 2002).Point processes are sometimes known as counting processes or random scatters.

### Statistics

The mathematical study of the likelihood and probability of events occurring based on known information and inferred by taking a limited number of samples. Statistics plays an extremely important role in many aspects of economics and science, allowing educated guesses to be made with a minimum of expensive or difficult-to-obtain data.A joke told about statistics (or, more precisely, about statisticians), runs as follows. Two statisticians are out hunting when one of them sees a duck. The first takes aim and shoots, but the bullet goes sailing past six inches too high. The second statistician also takes aim and shoots, but this time the bullet goes sailing past six inches too low. The two statisticians then give one another high fives and exclaim, "Got him!" (This joke plays on the fact that the mean of and 6 is 0, so "on average," the two shots hit the duck.)Approximately 73.8474% of extant statistical jokes are maintained..

### Stationary point process

There are at least two distinct notions of when a pointprocess is stationary.The most commonly utilized terminology is as follows: Intuitively, a point process defined on a subset of is said to be stationary if the number of points lying in depends on the size of but not its location. On the real line, this is expressed in terms of intervals: A point process on is stationary if for all and for ,depends on the length of but not on the location .Stationary point processes of this kind were originally called simple stationary, though several authors call it crudely stationary instead. In light of the notion of crude stationarity, a different definition of stationary may be stated in which a point process is stationary whenever for every and for all bounded Borel subsets of , the joint distribution of does not depend on . This distinction also gives rise to a related notion known as interval stationarity.Some authors use the alternative definition of an intensity..

### Mutually exclusive events

events are said to be mutually exclusive if the occurrence of any one of them precludes any of the others. Therefore, for events , ..., , the conditional probability is for all .

### Multidimensional point process

A multidimensional point process is a measurable function from a probability space into where is the set of all finite or countable subsets of not containing an accumulation point and where is the sigma-algebra generated over by the setsfor all bounded Borel subsets . Here, denotes the cardinality or order of the set .A multidimensional point process is sometimes abbreviated MPP, though care should be exhibited not to confuse the notion with that of a marked point process.Despite a number of apparent differences, one can show that multidimensional point processes are a special case of a random closed set on (Baudin 1984).

### De m&eacute;r&eacute;'s problem

The probability of getting at least one "6" in four rolls of a single 6-sideddie is(1)which is slightly higher than the probability of at least one double-six in 24 throws of two dice,(2)The French nobleman and gambler Chevalier de Méré suspected that (1) was higher than (2), but his mathematical skills were not great enough to demonstrate why this should be so. He posed the question to Pascal, who solved the problem and proved de Méré correct. In fact, de Méré's observation remains true even if two dice are thrown 25 times, since the probability of throwing at least one double-six is then(3)

### Mills ratio

The Mills ratio is defined as(1)(2)(3)where is the hazard function, is the survival function, is the probability density function, and is the distribution function.For example, for the normal distribution,(4)which simplifies to(5)for the standard normal distribution. The latter function has the particularly simple continued fraction representation(6)(Cuyt et al. 2010, p. 376).

### Simple point process

A simple point process (or SPP) is an almost surely increasing sequence of strictly positive, possibly infinite random variables which are strictly increasing as long as they are finite and whose almost sure limit is . Symbolically, then, an SPP is a sequence of -valued random variables defined on a probability space such that 1. , 2. , 3. . Here, and for each , can be interpreted as either the time point at which the th recording of an event takes place or as an indication that fewer than events occurred altogether if or if , respectively (Jacobsen 2006).

### Marked point process

A marked point process with mark space is a double sequenceof -valued random variables and -valued random variables defined on a probability space such that is a simple point process (SPP) and: 1. for ; 2. for . Here, denotes probability, denotes the so-called irrelevant mark which is used to describe the mark of an event that never occurs, and .This definition is similar to the definition of an SPP in that it describes a sequence of time points marking the occurrence of events. The difference is that these events may be of different types where the type (i.e., the mark) of the th event is denoted by . Note that, because of the inclusion of the irrelevant mark , marking will assign values for all --even when , i.e., when the th event never occurs (Jacobsen 2006).

### Mark space

Given a marked point process of the formthe space is said to be the mark space of .

### Conditional probability

The conditional probability of an event assuming that has occurred, denoted , equals(1)which can be proven directly using a Venn diagram.Multiplying through, this becomes(2)which can be generalized to(3)Rearranging (1) gives(4)Solving (4) for and plugging in to (1) gives(5)

Consider a game, first proposed by Nicolaus Bernoulli, in which a player bets on how many tosses of a coin will be needed before it first turns up heads. The player pays a fixed amount initially, and then receives dollars if the coin comes up heads on the th toss. The expectation value of the gain is then(1)dollars, so any finite amount of money can be wagered and the player will still come out ahead on average.Feller (1968) discusses a modified version of the game in which the player receives nothing if a trial takes more than a fixed number of tosses. The classical theory of this modified game concluded that is a fair entrance fee, but Feller notes that "the modern student will hardly understand the mysterious discussions of this 'paradox.' "In another modified version of the game, the player bets $2 that heads will turn up on the first throw,$4 that heads will turn up on the second throw (if it did not turn up on the first), \$8 that heads will turn..

### Coin tossing

An idealized coin consists of a circular disk of zero thickness which, when thrown in the air and allowed to fall, will rest with either side face up ("heads" H or "tails" T) with equal probability. A coin is therefore a two-sided die. Despite slight differences between the sides and nonzero thickness of actual coins, the distribution of their tosses makes a good approximation to a Bernoulli distribution.There are, however, some rather counterintuitive properties of coin tossing. For example, it is twice as likely that the triple TTH will be encountered before THT than after it, and three times as likely that THH will precede HHT. Furthermore, it is six times as likely that HTT will be the first of HTT, TTH, and TTT to occur than either of the others (Honsberger 1979). There are also strings of Hs and Ts that have the property that the expected wait to see string is less than the expected wait to see , but the probability of seeing before..

### Russian roulette

Russian roulette is a game of chance in which one or more of the six chambers of a revolver are filled with cartridges, the chamber is rotated at random, and the gun is fired. The shooter bets on whether the chamber which rotates into place will be loaded. If it is, he loses not only his bet but his life. In the case of a revolver with six chambers (revolvers with 5, 7, or 8 chambers are also common), the shooter has a 1/6 chance of dying (ignoring the fact that the probability of firing the round is always somewhat less than for a -shot revolver because the mass of the round in the cylinder causes an imbalance, and the cylinder will tend to stop rotating with its heavy side at or close to the bottom, while the firing pin is opposite the top chamber).A modified version is considered by Blom et al. (1996) and Blom (1989). In this variant, the revolver is loaded with a single cartridge, and two duelists alternately spin the chamber and fire at themselves until one is killed...

### Random closed set

A random closed set (RACS) in is a measurable function from a probability space into where is the collection of all closed subsets of and where denotes the sigma-algebra generated over the by setsfor all compact subsets .Originally, RACS were defined not on but in the more general setting of locally compact and separable (LCS) topological spaces (Baudin 1984) which may or may not be T2. In this case, the above definition is modified so that is defined to be the collection of closed subsets of some ambient LCS space (Molchanov 2005).Despite a number of apparent differences, one can show that multidimensional point processes are a special case of RACS when talking about (Baudin 1984).

### Quantile function

Given a random variable with continuous and strictly monotonic probability density function , a quantile function assigns to each probability attained by the value for which . Symbolically,Defining quantile functions for discrete rather than continuous distributions requires a bit more work since the discrete nature of such a distribution means that there may be gaps between values in the domain of the distribution function and/or "plateaus" in its range. Therefore, one often defines the associated quantile function to bewhere denotes the range of .

If proofreader finds mistakes and proofreader finds mistakes, of which were also found by , how many mistakes were missed by both and ? Assume there are a total of mistakes, so proofreader finds a fraction of all mistakes, and also a fraction of the mistakes found by . Assuming these fractions are the same, then solving for givesThe number of mistakes missed by both is therefore approximately

### Interval stationary point process

A point process on is said to be interval stationary if for every and for all integers , the joint distribution ofdoes not depend on , . Here, is an interval for all .As pointed out in a variety of literature (e.g., Daley and Vere-Jones 2002, pp 45-46), the notion of an interval stationary point process is intimately connected to (though fundamentally different from) the idea of a stationary point process in the Borel set sense of the term. Worth noting, too, is the difference between interval stationarity and other notions such as simple/crude stationarity.Though it has been done, it is more difficult to extend to the notion of interval stationarity; doing so requires a significant amount of additional machinery and reflects, overall, the significantly-increased structural complexity of higher-dimensional Euclidean spaces (Daley and Vere-Jones 2007)...

### Probability space

A triple on the domain , where is a measurable space, are the measurable subsets of , and is a measure on with .

### Intensity measure

The intensity measure of a point process relative to a Borel set is defined to be the expected number of points of falling in . Symbolically,where here, denotes the expected value.The notion of an intensity measure is intimately connected to one oft-discussed notionof intensity function (Pawlas 2008).

### Probability measure

Consider a probability space specified by the triple , where is a measurable space, with the domain and is its measurable subsets, and is a measure on with . Then the measure is said to be a probability measure. Equivalently, is said to be normalized.

### Intensity function

There are at least two distinct notions of an intensity function related to the theoryof point processes.In some literature, the intensity of a point process is defined to be the quantity(1)provided it exists. Here, denotes probability. In particular, it makes sense to talk about point processes having infinite intensity, though when finite, allows to be rewritten so that(2)as where here, denotes little-O notation (Daley and Vere-Jones 2007).Other authors define the function to be an intensity function of a point process provided that is a density of the intensity measure associated to relative to Lebesgue measure, i.e.,if for all Borel sets in ,(3)where denotes Lebesgue measure (Pawlas 2008).

### Independent statistics

Two variates and are statistically independent iff the conditional probability of given satisfies(1)in which case the probability of and is just(2)If events , , ..., are independent, then(3)Statistically independent variables are always uncorrelated,but the converse is not necessarily true.

### Bonferroni inequalities

Let be the probability that is true, and be the probability that at least one of , , ..., is true. Then "the" Bonferroni inequality, also known as Boole's inequality, states thatwhere denotes the union. If and are disjoint sets for all and , then the inequality becomes an equality. A beautiful theorem that expresses the exact relationship between the probability of unions and probabilities of individual events is known as the inclusion-exclusion principle.A slightly wider class of inequalities are also known as "Bonferroni inequalities."

### Probability domain

Evans et al. (2000, p. 6) use the unfortunate term "probability domain" to refer to the range of the distribution function of a probability density function. For a continuous distribution, the probability domain is simply the interval , whereas for a discrete distribution, it is a subset of that interval.

### Probability density function

The probability density function (PDF) of a continuous distribution is defined as the derivative of the (cumulative) distribution function ,(1)(2)(3)so(4)(5)A probability function satisfies(6)and is constrained by the normalization condition,(7)(8)Special cases are(9)(10)(11)(12)(13)To find the probability function in a set of transformed variables, find the Jacobian. For example, If , then(14)so(15)Similarly, if and , then(16)Given probability functions , , ..., , the sum distribution has probability function(17)where is a delta function. Similarly, the probability function for the distribution of is given by(18)The difference distribution has probability function(19)and the ratio distribution has probability function(20)Given the moments of a distribution (, , and the gamma statistics ), the asymptotic probability function is given by(21)where(22)is the normal distribution, and(23)for (with cumulants and..

### Bayes' theorem

Let and be sets. Conditional probability requires that(1)where denotes intersection ("and"), and also that(2)Therefore,(3)Now, let(4)so is an event in and for , then(5)(6)But this can be written(7)so(8)(Papoulis 1984, pp. 38-39).

### Cauchy distribution

The Cauchy distribution, also called the Lorentzian distribution or Lorentz distribution, is a continuous distribution describing resonance behavior. It also describes the distribution of horizontal distances at which a line segment tilted at a random angle cuts the x-axis.Let represent the angle that a line, with fixed point of rotation, makes with the vertical axis, as shown above. Then(1)(2)(3)(4)so the distribution of angle is given by(5)This is normalized over all angles, since(6)and(7)(8)(9)The general Cauchy distribution and its cumulative distribution can be written as(10)(11)where is the half width at half maximum and is the statistical median. In the illustration about, .The Cauchy distribution is implemented in the Wolfram Language as CauchyDistribution[m, Gamma/2].The characteristic function is(12)(13)The moments of the distribution are undefined since the integrals(14)diverge for .If and are variates with..

### Class

The word "class" has many specialized meanings in mathematics in which it refers to a group of objects with some common property (e.g., characteristic class or conjugacy class.)In statistics, a class is a grouping of values by which data is binned for computation of a frequency distribution (Kenney and Keeping 1962, p. 14). The range of values of a given class is called a class interval, the boundaries of an interval are called class limits, and the middle of a class interval is called the class mark.The following table summarizes the classes illustrated in the histogramabove for an example data set.class intervalclass markabsolute frequencyrelative frequencycumulative absolute frequencyrelative cumulative frequency0.00- 9.99510.0110.0110.00-19.991530.0340.0420.00-29.992580.08120.1230.00-39.9935180.18300.3040.00-49.9945240.24540.5450.00-59.9955220.22760.7660.00-69.9965150.15910.9170.00-79.997580.08990.9980.00-89.998500.00990.9990.00-99.999510.011001.00..

### Sample

A sample is a subset of a population that is obtained through some process, possibly random selection or selection based on a certain set of criteria, for the purposes of investigating the properties of the underlying parent population. In particular, statistical quantities determined directly from the sample (such as sample central moments, sample raw moments, sample mean, sample variance, etc.) can be used as estimators for the corresponding properties of the underlying distribution.The process of obtaining a sample is known as sampling, and the number of members in a sample is called the sample size.

### Lexis trials

sets of trials each, with the probability of success constant in each set.where is the variance of .

### Lexis ratio

where is the variance in a set of Lexis trials and is the variance assuming Bernoulli trials. If , the trials are said to be subnormal, and if , the trials are said to be supernormal.

### Supernormal

Trials for which the Lexis ratiosatisfies , where is the variance in a set of Lexis trials and is the variance assuming Bernoulli trials.

### Poisson trials

A number of trials in which the probability of success varies from trial to trial. Let be the number of successes, then(1)where is the variance of and . Uspensky has shown that(2)where(3)(4)(5)(6)and . The probability that the number of successes is at least is given by(7)Uspensky gives the true probability that there are at least successes in trials as(8)where(9)(10)

### Experiment

An experiment is defined (Papoulis 1984, p. 30) as a mathematical object consisting of the following elements. 1. A set (the probability space) of elements. 2. A Borel field consisting of certain subsets of called events. 3. A number satisfying the probability axioms, called the probability, that is assigned to every event .

### Sample proportion

Let there be successes out of Bernoulli trials. The sample proportion is the fraction of samples which were successes, so(1)For large , has an approximately normal distribution. Let RE be the relative error and SE the standard error, then(2)(3)(4)where CI is the confidence interval and is the erf function. The number of tries needed to determine with relative error RE and confidence interval CI is(5)

### Run

A run is a sequence of more than one consecutive identical outcomes, also known as a clump.Let be the probability that a run of or more consecutive heads appears in independent tosses of a coin (i.e., Bernoulli trials). This is equivalent to repeated picking from an urn containing two distinguishable objects with replacement after each pick. Let the probability of obtaining a head be . Then there is a beautiful formula for given in terms of the coefficients of the generating function(1)(Feller 1968, p. 300). Then(2)The following table gives the triangle of numbers for , 2, ... and , 2, ..., (OEIS A050227).SloaneA000225A008466A050231A050233123456781100000002310000003731000004158310000531198310006634320831007127944720831082552011074820831The special case gives the sequence(3)where is a Fibonacci number. Similarly, the probability that no consecutive tails will occur in tosses is given by , where is a Fibonacci k-step..

### Trivariate normal distribution

A multivariate normal distribution in three variables. It has probability density function(1)where(2)The standardized trivariate normal distribution takes unit variances and . The quadrant probability in this special case is then given analytically by(3)(Rose and Smith 1996; Stuart and Ord 1998; Rose and Smith 2002, p. 231).

### Wiener numbers

A sequence of uncorrelated numbers developed by Wiener (1926-1927). The numbers are constructed by beginning with(1)then forming the outer product with to obtain(2)This row is repeated twice, and its outer product is then taken to give(3)This is then repeated four times. The procedure is repeated, and the result repeated eight times, and so on. The sequences from each stage are then concatenated to form the sequence 1, , 1, 1, 1, , , 1, , , 1, 1, 1, , , 1, , , ....

### Redundancy

where is the entropy and is the joint entropy. Linear redundancy is defined aswhere are eigenvalues of the correlation matrix.

### Predictability

Predictability at a time in the future is defined byand linear predictability bywhere and are the redundancy and linear redundancy, and is the entropy.

### Nonstationary time series

A time series , , ... is nonstationary if, for some , the joint probability distribution of , , ..., is dependent on the time index .

### Statistical correlation

For two random variates and , the correlation is defined bY(1)where denotes standard deviation and is the covariance of these two variables. For the general case of variables and , where , 2, ..., ,(2)where are elements of the covariance matrix. In general, a correlation gives the strength of the relationship between variables. For ,(3)The variance of any quantity is always nonnegativeby definition, so(4)From a property of variances, the sum can be expanded(5)(6)(7)Therefore,(8)Similarly,(9)(10)(11)(12)Therefore,(13)so .For a linear combination of two variables,(14)(15)(16)(17)Examine the cases where ,(18)(19)The variance will be zero if , which requires that the argument of the variance is a constant. Therefore, , so . If , is either perfectly correlated () or perfectly anticorrelated () with ...

### Least squares fitting--exponential

To fit a functional form(1)take the logarithm of both sides(2)The best-fit values are then(3)(4)where and .This fit gives greater weights to small values so, in order to weight the points equally, it is often better to minimize the function(5)Applying least squares fitting gives(6)(7)(8)Solving for and ,(9)(10)In the plot above, the short-dashed curve is the fit computed from (◇) and (◇) and the long-dashed curve is the fit computed from (9) and (10).

### Least squares fitting

A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand.In practice, the vertical offsets from a line (polynomial, surface, hyperplane, etc.) are almost always minimized instead of the perpendicular offsets. This provides a fitting function for the independent variable that estimates for a given (most often what an experimenter wants), allows uncertainties of the data points along the - and -axes to be incorporated simply, and also provides a much..

### Sheppard's correction

A correction which must be applied to the measured moments obtained from normally distributed data which have been binned in order to obtain correct estimators for the population moments . The corrected versions of the second, third, and fourth moments are then(1)(2)(3)where is the class interval.If is the th cumulant of an ungrouped distribution and the th cumulant of the grouped distribution with class interval , the corrected cumulants (under rather restrictive conditions) are(4)where is the th Bernoulli number, giving(5)(6)(7)(8)(9)(10)For a proof, see Kendall et al. (1998).

### Cumulant

Let be the characteristic function, defined as the Fourier transform of the probability density function using Fourier transform parameters ,(1)(2)The cumulants are then defined by(3)(Abramowitz and Stegun 1972, p. 928). Taking the Maclaurinseries gives(4)where are raw moments, so(5)(6)(7)(8)(9)These transformations can be given by CumulantToRaw[n] in the Mathematica application package mathStatica.In terms of the central moments ,(10)(11)(12)(13)(14)where is the mean and is the variance. These transformations can be given by CumulantToCentral[n].Multivariate cumulants can be expressed in terms of raw moments, e.g.,(15)(16)and central moments, e.g.,(17)(18)(19)(20)(21)using CumulantToRaw[m, n, ...] and CumulantToCentral[m, n, ...], respectively.The k-statistics are unbiasedestimators of the cumulants...

### Sample variance distribution

Let samples be taken from a population with central moments . The sample variance is then given by(1)where is the sample mean.The expected value of for a sample size is then given by(2)Similarly, the expected variance of the sample varianceis given by(3)(4)(Kenney and Keeping 1951, p. 164; Rose and Smith 2002, p. 264).The algebra of deriving equation (4) by hand is rather tedious,but can be performed as follows. Begin by noting that(5)so(6)The value of is already known from equation (◇), so it remains only to find . The algebra is simplified considerably by immediately transforming variables to and performing computations with respect to these central variables. Since the variance does not depend on the mean of the underlying distribution, the result obtained using the transformed variables will give an identical result while immediately eliminating expectation values of sums of terms containing odd powers of (which..

### Moment problem

The moment problem, also called "Hausdorff's moment problem" or the "little moment problem," may be stated as follows. Given a sequence of numbers , under what conditions is it possible to determine a function of bounded variation in the interval such thatfor , 1, .... Such a sequence is called a moment sequence, and Hausdorff (1921ab) was the first to obtain necessary and sufficient conditions for a sequence to be a moment sequence.

### Covariance

Covariance provides a measure of the strength of the correlation between two or more sets of random variates. The covariance for two random variates and , each with sample size , is defined by the expectation value(1)(2)where and are the respective means, which can be written out explicitly as(3)For uncorrelated variates,(4)so the covariance is zero. However, if the variables are correlated in some way, then their covariance will be nonzero. In fact, if , then tends to increase as increases, and if , then tends to decrease as increases. Note that while statistically independent variables are always uncorrelated, the converse is not necessarily true.In the special case of ,(5)(6)so the covariance reduces to the usual variance . This motivates the use of the symbol , which then provides a consistent way of denoting the variance as , where is the standard deviation.The derived quantity(7)(8)is called statistical correlation of and .The covariance..

### Sample variance computation

When computing the sample variance numerically, the mean must be computed before can be determined. This requires storing the set of sample values. However, it is possible to calculate using a recursion relationship involving only the last sample as follows. This means itself need not be precomputed, and only a running set of values need be stored at each step.In the following, use the somewhat less than optimal notation to denote calculated from the first samples (i.e., not the th moment)(1)and let denotes the value for the bias-corrected sample variance calculated from the first samples. The first few values calculated for the mean are(2)(3)(4)Therefore, for , 3 it is true that(5)Therefore, by induction,(6)(7)(8)(9)By the definition of the sample variance,(10)for . Defining , can then be computed using the recurrence equation(11)(12)(13)(14)Working on the first term,(15)(16)Use (◇) to write(17)so(18)Now work on the second..

### Charlier's check

A check which can be used to verify correct computations in a table of grouped classes. For example, consider the following table with specified class limits and frequencies . The class marks are then computed as well as the rescaled frequencies , which are given by(1)where the class mark is taken as and the class interval is . The remaining quantities are then computed as follows.class limits30-3934.52321840-4944.53271250-5954.511441160-6964.52020070-7974.5320003280-8984.5251252510090-9994.572142863total100176236In order to compute the variance, note that(2)(3)(4)so the variance of the original data is(5)Charlier's check makes use of the additional column added to the right side of the table. By noting that the identity(6)(7)connects columns five through seven, it can be checked that the computations have been done correctly. In the example above,(8)so the computations pass Charlier's check...

### Moment

The th raw moment (i.e., moment about zero) of a distribution is defined by(1)where(2), the mean, is usually simply denoted . If the moment is instead taken about a point ,(3)A statistical distribution is not uniquely specified by its moments, although it is by its characteristic function.The moments are most commonly taken about the mean. These so-called central moments are denoted and are defined by(4)(5)with . The second moment about the mean is equal to the variance(6)where is called the standard deviation.The related characteristic function isdefined by(7)(8)The moments may be simply computed using the moment-generatingfunction,(9)

### Sample raw moment

The th sample raw moment of a sample with sample size is defined as(1)The sample raw moments are unbiased estimators of the population rawmoments,(2)(Rose and Smith 2002, p. 253). The sample raw moment is related to power sums by(3)This relationship can be given by SampleRawToPowerSum[r] in the Mathematica application package mathStatica.

### Central moment

A moment of a univariate probability density function taken about the mean ,(1)(2)where denotes the expectation value. The central moments can be expressed as terms of the raw moments (i.e., those taken about zero) using the binomial transform(3)with (Papoulis 1984, p. 146). The first few central moments expressed in terms of the raw moments are therefore(4)(5)(6)(7)(8)These transformations can be obtained using CentralToRaw[n] in the Mathematica application package mathStatica.The central moments can also be expressed in terms of the cumulants , with the first few cases given by(9)(10)(11)(12)These transformations can be obtained using CentralToCumulant[n] in the Mathematica application package mathStatica.The central moment of a multivariate probability density function can be similarly defined as(13)Therefore,(14)For example,(15)(16)Similarly, the multivariate central moments can be expressed in terms..

### Bessel's correction

Bessel's correction is the factor in the relationship between the variance and the expectation values of the sample variance,(1)where(2)As noted by Kenney and Keeping (1951, p. 161), the correction factor is probably more properly attributed to Gauss, who used it in this connection as early as 1823 (Gauss 1823).For two samples,(3)(Kenney and Keeping 1951, p. 162).

### Hypothesis testing

Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps.1. Formulate the null hypothesis (commonly, that the observations are the result of pure chance) and the alternative hypothesis (commonly, that the observations show a real effect combined with a component of chance variation). 2. Identify a test statistic that can be used toassess the truth of the null hypothesis. 3. Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the -value, the stronger the evidence against the null hypothesis. 4. Compare the -value to an acceptable significance value (sometimes called an alpha value). If , that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis..

### Dot plot

A dot plot, also called a dot chart, is a type of simple histogram-like chart used in statistics for relatively small data sets where values fall into a number of discrete bins. To draw a dot plot, count the number of data points falling in each bin and draw a stack of dots that number high for each bin. The illustration above shows such a plot for a random sample of 100 integers chosen between 1 and 25 inclusively.Simple code for drawing a dot plot in the WolframLanguage with some appropriate labeling of bin heights can be given asDotPlot[data_] := Module[{m = Tally[Sort[data]]}, ListPlot[Flatten[Table[{1, n}, {n, 2}]& @@@ m, 1], Ticks -> {Automatic, Range[0, Max[m[[All, 2]]]]}]]

### Arbitrary precision

In most computer programs and computing environments, the precision of any calculation (even including addition) is limited by the word size of the computer, that is, by largest number that can be stored in one of the processor's registers. As of mid-2002, the most common processor word size is 32 bits, corresponding to the integer . General integer arithmetic on a 32-bit machine therefore allows addition of two 32-bit numbers to get 33 bits (one word plus an overflow bit), multiplication of two 32-bit numbers to get 64 bits (although the most prevalent programming language, C, cannot access the higher word directly and depends on the programmer to either create a machine language function or write a much slower function in C at a final overhead of about nine multiplies more), and division of a 64-bit number by a 32-bit number creating a 32-bit quotient and a 32-bit remainder/modulus.Arbitrary-precision arithmetic consists of a set of algorithms,..

### Quantum stochastic calculus

Let , , be one-dimensional Brownian motion. Integration with respect to was defined by Itô (1951). A basic result of the theory is that stochastic integral equations of the form(1)can be interpreted as stochastic differential equations of the form(2)where differentials are handled with the use of Itô's formula(3)(4)Hudson and Parthasarathy (1984) obtained a Fock space representation of Brownian motion and Poisson processes. The boson Fock space over is the Hilbert space completion of the linear span of the exponential vectors under the inner product(5)where and and is the complex conjugate of .The annihilation, creation and conservation operators , and respectively, are defined on the exponential vectors of as follows,(6)(7)(8)The basic quantum stochastic differentials , , and are defined as follows,(9)(10)(11)Hudson and Parthasarathy (1984) defined stochastic integration with respect to the noise differentials..

### Correlation ratio

Let there be observations of the th phenomenon, where , ..., and(1)(2)(3)Then the sample correlation ratio is defined by(4)Let be the population correlation ratio. If for , then(5)where(6)(7)(8)and is the confluent hypergeometric limit function. If , then(9)(Kenney and Keeping 1951, pp. 323-324).

### Normal equation

Given a matrix equationthe normal equation is that which minimizes the sum of the square differences between the left and right sides:It is called a normal equation because is normal to the range of .Here, is a normal matrix.

### Least squares fitting--power law

Given a function of the form(1)least squares fitting gives the coefficientsas(2)(3)where and .

### Correlation coefficient--bivariate normal distribution

For a bivariate normal distribution, the distribution of correlation coefficients is given by(1)(2)(3)where is the population correlation coefficient, is a hypergeometric function, and is the gamma function (Kenney and Keeping 1951, pp. 217-221). The moments are(4)(5)(6)(7)where . If the variates are uncorrelated, then and(8)(9)so(10)(11)But from the Legendre duplication formula,(12)so(13)(14)(15)(16)The uncorrelated case can be derived more simply by letting be the true slope, so that . Then(17)is distributed as Student's t with degrees of freedom. Let the population regression coefficient be 0, then , so(18)and the distribution is(19)Plugging in for and using(20)(21)(22)gives(23)(24)(25)(26)so(27)as before. See Bevington (1969, pp. 122-123) or Pugh and Winslow (1966, §12-8). If we are interested instead in the probability that a correlation coefficient would be obtained , where is the observed..

### Nonlinear least squares fitting

Given a function of a variable tabulated at values , ..., , assume the function is of known analytic form depending on parameters , and consider the overdetermined set of equations(1)(2)We desire to solve these equations to obtain the values , ..., which best satisfy this system of equations. Pick an initial guess for the and then define(3)Now obtain a linearized estimate for the changes needed to reduce to 0,(4)for , ..., , where . This can be written in component form as(5)where is the matrix(6)In more concise matrix form,(7)where is an -vector and is an -vector.Applying the transpose of to both sides gives(8)Defining(9)(10)in terms of the known quantities and then gives the matrix equation(11)which can be solved for using standard matrix techniques such as Gaussian elimination. This offset is then applied to and a new is calculated. By iteratively applying this procedure until the elements of become smaller than some prescribed limit, a solution..

### Least squares fitting--polynomial

Generalizing from a straight line (i.e., first degree polynomial) to a th degree polynomial(1)the residual is given by(2)The partial derivatives (again dropping superscripts)are(3)(4)(5)These lead to the equations(6)(7)(8)or, in matrix form(9)This is a Vandermonde matrix. We can also obtainthe matrix for a least squares fit by writing(10)Premultiplying both sides by the transpose of the firstmatrix then gives(11)so(12)As before, given points and fitting with polynomial coefficients , ..., gives(13)In matrix notation, the equation for a polynomial fitis given by(14)This can be solved by premultiplying by the transpose ,(15)This matrix equation can be solved numerically,or can be inverted directly if it is well formed, to yield the solution vector(16)Setting in the above equations reproduces the linear solution...

### Correlation coefficient

The correlation coefficient, sometimes also called the cross-correlation coefficient, Pearson correlation coefficient (PCC), Pearson's , the Perason product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a quantity that gives the quality of a least squares fitting to the original data. To define the correlation coefficient, first consider the sum of squared values , , and of a set of data points about their respective means,(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)These quantities are simply unnormalized forms of the variances and covariance of and given by(13)(14)(15)For linear least squares fitting, the coefficient in(16)is given by(17)(18)and the coefficient in(19)is given by(20)The correlation coefficient (sometimes also denoted ) is then defined by(21)(22)The correlation coefficient is also known as the product-moment coefficient of correlation or Pearson's correlation. The correlation..

### Least squares fitting--perpendicular offsets

In practice, the vertical offsets from a line (polynomial, surface, hyperplane, etc.) are almost always minimized instead of the perpendicular offsets. This provides a fitting function for the independent variable that estimates for a given (most often what an experimenter wants), allows uncertainties of the data points along the - and -axes to be incorporated simply, and also provides a much simpler analytic form for the fitting parameters than would be obtained using a fit based on perpendicular offsets.The residuals of the best-fit line for a set of points using unsquared perpendicular distances of points are given by(1)Since the perpendicular distance from a line to point is given by(2)the function to be minimized is(3)Unfortunately, because the absolute value function does not have continuous derivatives, minimizing is not amenable to analytic solution. However, if the square of the perpendicular distances(4)is minimized instead,..

### Least squares fitting--logarithmic

Given a function of the form(1)the coefficients can be found from leastsquares fitting as(2)(3)

### Raw moment

A moment of a probability function taken about 0,(1)(2)The raw moments (sometimes also called "crude moments") can be expressed as terms of the central moments (i.e., those taken about the mean ) using the inverse binomial transform(3)with and (Papoulis 1984, p. 146). The first few values are therefore(4)(5)(6)(7)The raw moments can also be expressed in terms of the cumulants by exponentiating both sides of the series(8)where is the characteristic function, to obtain(9)The first few terms are then given by(10)(11)(12)(13)(14)These transformations can be obtained using RawToCumulant[n] in the Mathematica application package mathStatica.The raw moment of a multivariate probability function can be similarly defined as(15)Therefore,(16)The multivariate raw moments can be expressed in terms of the multivariate cumulants. For example,(17)(18)These transformations can be obtained using RawToCumulant[m,..

### Kendall operator

The operator that can be used to derive multivariate formulas for moments and cumulants from corresponding univariate formulas.For example, to derive the expression for the multivariate central moments in terms of multivariate cumulants, begin with(1)Now rewrite each variable as to obtain(2)Now differentiate each side with respect to , where(3)and wherever there is a term with a derivative , remove the derivative and replace the argument with times itself, so(4)Now set any s appearing as coefficients to 1, so(5)Dividing through by 4 gives(6)Finally, set any coefficients powers of appearing as term coefficients to 1 and interpret the resulting terms as , so that the above gives(7)This procedure can be repeated up to times, where is the subscript of the univariate case.Iterating the above procedure gives(8)(9)(10)(11)(12)giving the identities(13)(14)(15)(16)(17)..

### Variation coefficient

If is the standard deviation of a set of samples and their mean, then the variation coefficient is defined as

Math Topics
Check the price