 # Probability

## Probability Topics

Sort by:

### Probability axioms

Given an event in a sample space which is either finite with elements or countably infinite with elements, then we can writeand a quantity , called the probability of event , is defined such that1. . 2. . 3. Additivity: , where and are mutually exclusive. 4. Countable additivity: for , 2, ..., where , , ... are mutually exclusive (i.e., ).

### Hawkes process

There are a number of point processes which are called Hawkes processes and while many of these notions are similar, some are rather different. There are also different formulations for univariate and multivariate point processes.In some literature, a univariate Hawkes process is defined to be a self-exciting temporal point process whose conditional intensity function is defined to be(1)where is the background rate of the process , where are the points in time occurring prior to time , and where is a function which governs the clustering density of . The function is sometimes called the exciting function or the excitation function of . Similarly, some authors (Merhdad and Zhu 2014) denote the conditional intensity function by and rewrite the summand in () as(2)The processes upon which Hawkes himself made the most progress were univariate self-exciting temporal point processes whose conditional intensity function is linear (Hawkes 1971)...

### Total probability theorem

Given mutually exclusive events , ..., whose probabilities sum to unity, thenwhere is an arbitrary event, and is the conditional probability of assuming .

### Temporal point process

A temporal point process is a random process whose realizations consist of the times of isolated events.Note that in some literature, the values are assumed to be arbitrary real numbers while the index set is assumed to be the set of integers (Schoenberg 2002); on the other hand, some authors view temporal point processes as binary events so that takes values in a two-element set for each , and further assume that the index set is some finite set of points (Liam 2013). The prior perspective corresponds to viewing temporal point processes as how long events occur where the events themselves are spaced according to a discrete set of time parameters; the latter view corresponds to viewing temporal point processes as indications of whether or not a finite number of events has occurred.The behavior of a simple temporal point process is typically modeled by specifying its conditional intensity . Indeed, a number of specific examples of temporal point..

### Tail probability

Define as the set of all points with probabilities such that or , where is a point probability (often, the likelihood of an observed event). Then the associated tail probability is given by .

### Point process

A point process is a probabilistic model for random scatterings of points on some space often assumed to be a subset of for some . Oftentimes, point processes describe the occurrence over time of random events in which the occurrences are revealed one-by-one as time evolves; in this case, any collectionof occurrences is said to be a realization of the point process.Poisson processes are regarded as archetypal examplesof point processes (Daley and Vere-Jones 2002).Point processes are sometimes known as counting processes or random scatters.

### Statistics

The mathematical study of the likelihood and probability of events occurring based on known information and inferred by taking a limited number of samples. Statistics plays an extremely important role in many aspects of economics and science, allowing educated guesses to be made with a minimum of expensive or difficult-to-obtain data.A joke told about statistics (or, more precisely, about statisticians), runs as follows. Two statisticians are out hunting when one of them sees a duck. The first takes aim and shoots, but the bullet goes sailing past six inches too high. The second statistician also takes aim and shoots, but this time the bullet goes sailing past six inches too low. The two statisticians then give one another high fives and exclaim, "Got him!" (This joke plays on the fact that the mean of and 6 is 0, so "on average," the two shots hit the duck.)Approximately 73.8474% of extant statistical jokes are maintained..

### Stationary point process

There are at least two distinct notions of when a pointprocess is stationary.The most commonly utilized terminology is as follows: Intuitively, a point process defined on a subset of is said to be stationary if the number of points lying in depends on the size of but not its location. On the real line, this is expressed in terms of intervals: A point process on is stationary if for all and for ,depends on the length of but not on the location .Stationary point processes of this kind were originally called simple stationary, though several authors call it crudely stationary instead. In light of the notion of crude stationarity, a different definition of stationary may be stated in which a point process is stationary whenever for every and for all bounded Borel subsets of , the joint distribution of does not depend on . This distinction also gives rise to a related notion known as interval stationarity.Some authors use the alternative definition of an intensity..

### Mutually exclusive events

events are said to be mutually exclusive if the occurrence of any one of them precludes any of the others. Therefore, for events , ..., , the conditional probability is for all .

### Multidimensional point process

A multidimensional point process is a measurable function from a probability space into where is the set of all finite or countable subsets of not containing an accumulation point and where is the sigma-algebra generated over by the setsfor all bounded Borel subsets . Here, denotes the cardinality or order of the set .A multidimensional point process is sometimes abbreviated MPP, though care should be exhibited not to confuse the notion with that of a marked point process.Despite a number of apparent differences, one can show that multidimensional point processes are a special case of a random closed set on (Baudin 1984).

### De m&eacute;r&eacute;'s problem

The probability of getting at least one "6" in four rolls of a single 6-sideddie is(1)which is slightly higher than the probability of at least one double-six in 24 throws of two dice,(2)The French nobleman and gambler Chevalier de Méré suspected that (1) was higher than (2), but his mathematical skills were not great enough to demonstrate why this should be so. He posed the question to Pascal, who solved the problem and proved de Méré correct. In fact, de Méré's observation remains true even if two dice are thrown 25 times, since the probability of throwing at least one double-six is then(3)

### Mills ratio

The Mills ratio is defined as(1)(2)(3)where is the hazard function, is the survival function, is the probability density function, and is the distribution function.For example, for the normal distribution,(4)which simplifies to(5)for the standard normal distribution. The latter function has the particularly simple continued fraction representation(6)(Cuyt et al. 2010, p. 376).

### Simple point process

A simple point process (or SPP) is an almost surely increasing sequence of strictly positive, possibly infinite random variables which are strictly increasing as long as they are finite and whose almost sure limit is . Symbolically, then, an SPP is a sequence of -valued random variables defined on a probability space such that 1. , 2. , 3. . Here, and for each , can be interpreted as either the time point at which the th recording of an event takes place or as an indication that fewer than events occurred altogether if or if , respectively (Jacobsen 2006).

### Marked point process

A marked point process with mark space is a double sequenceof -valued random variables and -valued random variables defined on a probability space such that is a simple point process (SPP) and: 1. for ; 2. for . Here, denotes probability, denotes the so-called irrelevant mark which is used to describe the mark of an event that never occurs, and .This definition is similar to the definition of an SPP in that it describes a sequence of time points marking the occurrence of events. The difference is that these events may be of different types where the type (i.e., the mark) of the th event is denoted by . Note that, because of the inclusion of the irrelevant mark , marking will assign values for all --even when , i.e., when the th event never occurs (Jacobsen 2006).

### Mark space

Given a marked point process of the formthe space is said to be the mark space of .

### Conditional probability

The conditional probability of an event assuming that has occurred, denoted , equals(1)which can be proven directly using a Venn diagram.Multiplying through, this becomes(2)which can be generalized to(3)Rearranging (1) gives(4)Solving (4) for and plugging in to (1) gives(5)

Consider a game, first proposed by Nicolaus Bernoulli, in which a player bets on how many tosses of a coin will be needed before it first turns up heads. The player pays a fixed amount initially, and then receives dollars if the coin comes up heads on the th toss. The expectation value of the gain is then(1)dollars, so any finite amount of money can be wagered and the player will still come out ahead on average.Feller (1968) discusses a modified version of the game in which the player receives nothing if a trial takes more than a fixed number of tosses. The classical theory of this modified game concluded that is a fair entrance fee, but Feller notes that "the modern student will hardly understand the mysterious discussions of this 'paradox.' "In another modified version of the game, the player bets $2 that heads will turn up on the first throw,$4 that heads will turn up on the second throw (if it did not turn up on the first), \$8 that heads will turn..

### Coin tossing

An idealized coin consists of a circular disk of zero thickness which, when thrown in the air and allowed to fall, will rest with either side face up ("heads" H or "tails" T) with equal probability. A coin is therefore a two-sided die. Despite slight differences between the sides and nonzero thickness of actual coins, the distribution of their tosses makes a good approximation to a Bernoulli distribution.There are, however, some rather counterintuitive properties of coin tossing. For example, it is twice as likely that the triple TTH will be encountered before THT than after it, and three times as likely that THH will precede HHT. Furthermore, it is six times as likely that HTT will be the first of HTT, TTH, and TTT to occur than either of the others (Honsberger 1979). There are also strings of Hs and Ts that have the property that the expected wait to see string is less than the expected wait to see , but the probability of seeing before..

### Russian roulette

Russian roulette is a game of chance in which one or more of the six chambers of a revolver are filled with cartridges, the chamber is rotated at random, and the gun is fired. The shooter bets on whether the chamber which rotates into place will be loaded. If it is, he loses not only his bet but his life. In the case of a revolver with six chambers (revolvers with 5, 7, or 8 chambers are also common), the shooter has a 1/6 chance of dying (ignoring the fact that the probability of firing the round is always somewhat less than for a -shot revolver because the mass of the round in the cylinder causes an imbalance, and the cylinder will tend to stop rotating with its heavy side at or close to the bottom, while the firing pin is opposite the top chamber).A modified version is considered by Blom et al. (1996) and Blom (1989). In this variant, the revolver is loaded with a single cartridge, and two duelists alternately spin the chamber and fire at themselves until one is killed...

### Random closed set

A random closed set (RACS) in is a measurable function from a probability space into where is the collection of all closed subsets of and where denotes the sigma-algebra generated over the by setsfor all compact subsets .Originally, RACS were defined not on but in the more general setting of locally compact and separable (LCS) topological spaces (Baudin 1984) which may or may not be T2. In this case, the above definition is modified so that is defined to be the collection of closed subsets of some ambient LCS space (Molchanov 2005).Despite a number of apparent differences, one can show that multidimensional point processes are a special case of RACS when talking about (Baudin 1984).

### Quantile function

Given a random variable with continuous and strictly monotonic probability density function , a quantile function assigns to each probability attained by the value for which . Symbolically,Defining quantile functions for discrete rather than continuous distributions requires a bit more work since the discrete nature of such a distribution means that there may be gaps between values in the domain of the distribution function and/or "plateaus" in its range. Therefore, one often defines the associated quantile function to bewhere denotes the range of .

If proofreader finds mistakes and proofreader finds mistakes, of which were also found by , how many mistakes were missed by both and ? Assume there are a total of mistakes, so proofreader finds a fraction of all mistakes, and also a fraction of the mistakes found by . Assuming these fractions are the same, then solving for givesThe number of mistakes missed by both is therefore approximately

### Interval stationary point process

A point process on is said to be interval stationary if for every and for all integers , the joint distribution ofdoes not depend on , . Here, is an interval for all .As pointed out in a variety of literature (e.g., Daley and Vere-Jones 2002, pp 45-46), the notion of an interval stationary point process is intimately connected to (though fundamentally different from) the idea of a stationary point process in the Borel set sense of the term. Worth noting, too, is the difference between interval stationarity and other notions such as simple/crude stationarity.Though it has been done, it is more difficult to extend to the notion of interval stationarity; doing so requires a significant amount of additional machinery and reflects, overall, the significantly-increased structural complexity of higher-dimensional Euclidean spaces (Daley and Vere-Jones 2007)...

### Probability space

A triple on the domain , where is a measurable space, are the measurable subsets of , and is a measure on with .

### Intensity measure

The intensity measure of a point process relative to a Borel set is defined to be the expected number of points of falling in . Symbolically,where here, denotes the expected value.The notion of an intensity measure is intimately connected to one oft-discussed notionof intensity function (Pawlas 2008).

### Probability measure

Consider a probability space specified by the triple , where is a measurable space, with the domain and is its measurable subsets, and is a measure on with . Then the measure is said to be a probability measure. Equivalently, is said to be normalized.

### Intensity function

There are at least two distinct notions of an intensity function related to the theoryof point processes.In some literature, the intensity of a point process is defined to be the quantity(1)provided it exists. Here, denotes probability. In particular, it makes sense to talk about point processes having infinite intensity, though when finite, allows to be rewritten so that(2)as where here, denotes little-O notation (Daley and Vere-Jones 2007).Other authors define the function to be an intensity function of a point process provided that is a density of the intensity measure associated to relative to Lebesgue measure, i.e.,if for all Borel sets in ,(3)where denotes Lebesgue measure (Pawlas 2008).

### Independent statistics

Two variates and are statistically independent iff the conditional probability of given satisfies(1)in which case the probability of and is just(2)If events , , ..., are independent, then(3)Statistically independent variables are always uncorrelated,but the converse is not necessarily true.

### Bonferroni inequalities

Let be the probability that is true, and be the probability that at least one of , , ..., is true. Then "the" Bonferroni inequality, also known as Boole's inequality, states thatwhere denotes the union. If and are disjoint sets for all and , then the inequality becomes an equality. A beautiful theorem that expresses the exact relationship between the probability of unions and probabilities of individual events is known as the inclusion-exclusion principle.A slightly wider class of inequalities are also known as "Bonferroni inequalities."

### Probability domain

Evans et al. (2000, p. 6) use the unfortunate term "probability domain" to refer to the range of the distribution function of a probability density function. For a continuous distribution, the probability domain is simply the interval , whereas for a discrete distribution, it is a subset of that interval.

### Probability density function

The probability density function (PDF) of a continuous distribution is defined as the derivative of the (cumulative) distribution function ,(1)(2)(3)so(4)(5)A probability function satisfies(6)and is constrained by the normalization condition,(7)(8)Special cases are(9)(10)(11)(12)(13)To find the probability function in a set of transformed variables, find the Jacobian. For example, If , then(14)so(15)Similarly, if and , then(16)Given probability functions , , ..., , the sum distribution has probability function(17)where is a delta function. Similarly, the probability function for the distribution of is given by(18)The difference distribution has probability function(19)and the ratio distribution has probability function(20)Given the moments of a distribution (, , and the gamma statistics ), the asymptotic probability function is given by(21)where(22)is the normal distribution, and(23)for (with cumulants and..

### Bayes' theorem

Let and be sets. Conditional probability requires that(1)where denotes intersection ("and"), and also that(2)Therefore,(3)Now, let(4)so is an event in and for , then(5)(6)But this can be written(7)so(8)(Papoulis 1984, pp. 38-39).

### Probability

Probability is the branch of mathematics that studies the possible outcomes of given events together with the outcomes' relative likelihoods and distributions. In common usage, the word "probability" is used to mean the chance that a particular event (or set of events) will occur expressed on a linear scale from 0 (impossibility) to 1 (certainty), also expressed as a percentage between 0 and 100%. The analysis of events governed by probability is called statistics.There are several competing interpretations of the actual "meaning" of probabilities. Frequentists view probability simply as a measure of the frequency of outcomes (the more conventional interpretation), while Bayesians treat probability more subjectively as a statistical procedure that endeavors to estimate parameters of an underlying distribution based on the observed distribution.A properly normalized function that assigns a probability..

### Hazard function

The hazard function (also known as the failure rate, hazard rate, or force of mortality) is the ratio of the probability density function to the survival function , given by(1)(2)where is the distribution function (Evans et al. 2000, p. 13).

### Score function

The score function is the partial derivativeof the log-likelihood function , where is the standard likelihood function.Defining the likelihood function(1)shows that(2)and thus that(3)(4)(5)Using the above formulation of , one can easily compute various statistical measurements associated with . For example, the mean can be shown to equal zero while the variance is precisely the Fisher information matrix. The score function has extensive uses in many areas of mathematics, both pure and applied, and is a key component of the field of likelihood theory.

### Markov's inequality

If takes only nonnegative values, then(1)To prove the theorem, write(2)(3)Since is a probability density, it must be . We have stipulated that , so(4)(5)(6)(7)(8)Q.E.D.