[Up][Contents][Index]

MCMC Sampling

This method is used for estimating constrained distributions. In principle it is fairly simple and gives distributions very close to analytically calculated distributions with very much less computation time and complexity in the calculations (see Buck et al 1992, Gilks et al 1996 and Gelfand and Smith 1990).

The Sampling Process
Constraints
Boundaries
Additional Information
Probabilities

The Sampling Process

The sampling process used is a form of Markov Chain Monte Carlo (see Gilks et al 1996 for an overview of the techniques). This technique allows a samples to be taken which properly reflect the probability distributions and constraints, thereby building up a histogram which can then be used as an estimate of the prior probability densities. If the initial probability distribution is unconstrained an approximation to the initial distribution is produced.

This program uses a mixture of Metropolis-Hastings algorithm and the more specific Gibbs sampler.

The Metropolis-Hastings algorithm uses a set of proposal moves which can both result in changes to single elements of the model or changes to the duration and timing of whole groups. This provides much faster convergence for complex models than the use of the Gibbs sampler on its own.

There are several different methods of implementing the Gibbs sampler; the one employed by this program is that a value t is selected and then p(t) compared to another randomly chosen value r which lies between 0 and max(p(t)); if p(t) > r the value is accepted as a sample; if p(t) < r the process is repeated until it is successful. These sampled values are then collected and a sample distribution generated. If the initial probability distribution is unconstrained an approximation to the initial distribution is produced.

This program is initially set up to do 30,000 iterations which gives fairly smooth distributions for most purposes but reasonable results are achieved much more quickly than this and the process can be stopped after 3,000 iterations. The first 100 iterations are discarded to allow the sampling process to converge.

Every 3000 iterations (called a `pass') the sampled distributions are saved (and can be plotted by redrawing the window) and checked for convergence. The results of the convergence tests are saved in a file Converg.14L. Every 6000 iterations any boundary conditions are relaxed to allow the system to find a new starting point (this is followed by a new burn-in period of 100 iterations from which the results are discarded). A full run consists therefore of five sub-runs each with a new starting point. The convergence tests will indicate if convergence is slow or different starting points have a significant effect on the result. The convergence test consists of checking the distribution from the preceding pass, p(t), with the accumulated distribution, P(t) - thus no real indication of the convergence is given until after two passes. The function used is an overlap integral of the form:

If the convergence is poor (less than 95%) the pass interval (initially 3000) will be increased by a factor of two. This is repeated until the convergence is satisfactory. The sampling can however be abandonned if necessary but in this case the results should not be used as the model is clearly not stable.

See also [Program Operation] and the section on [Convergence].

Constraints

The reason for using the MCMC method is that it enables constraints to be imposed by defining a constraint distribution c(t) the sampling is then performed from the combined distribution p(t)c(t) (application of Bayes' theorem). The c(t) term is usually defined by the samples taken from the other samples.

The whole operation consists of finding samples from each distribution which are consistent with the constraints (sometimes this is not possible in which case the message `cannot resolve order' is displayed). Each distribution is then sampled always calculating the constraints from the latest sample of the other values. In this way once the constraints have been satisfied they will be for all subsequent sampling iterations.

Since the initial sample may be unrepresentative it is usual to ignore the first few iterations and in this program the first 100 are discarded.

Sequences

For a group of items in a sequence (Sequence) the constraints are fairly simple. If the sampled times are written as t_i for the ith member of the group the constraints are:

t_i < t_i+1, forall i < n

If a phase (Phase) is contained within a sequence this becomes a little more complicated. If we treat each member of the sequence as a phase of one or more elements t_ij the constraints for any two subsequent elements of the sequence will be:

t_ij < t_(i+1)k, forall j and k

These constraints provide a constraint function c(t) which is just an upper and lower limit which can be easily incorporated into the sampling method.

Termini

The presence of termini within a sequence is dealt with by dispensing with one set of constraints. If the ith member of the sequence is a terminus ante quem (TAQ) and, for simplicity, assuming the elements on either side are simple elements the constraints imposed are:

t_(i-1) < t_i, t_(i-1) < t_(i+1)

Sequences with approximate gaps

These are treated rather differently (V_Sequence). Here a function is defined for c(t) which is then used in the sampling process. If we have three members of the sequence as:

the constraint distribution for t_i will be given by:

It should be noted that if the error term is too low in this the samples would always be constrained to the initial selection. Also such an initial selection will become increasingly difficult in these circumstances so this method should only be used when the error terms are greater than the resolution defined (in fact the program forces you to do this) and may well fail if the sequence has a large number of element. Such failure will result in very slow progress and the message `improbable value'.

Boundaries

Boundaries are important to the creation of large models as they allow one to correct for the greater statistical weight of groups of events with larger spans. The theory behind the notion of uniform phases is outlined elsewhere (Buck et al 1992) but further insights have been provided by Goeff Nichols (private communication).

A justification for the technique can be outlined thus: The geological or archaeological events under study in any dating research are assumed to be Poisson distributed through a period of time. The dated events which we put into our model are also assumed to be Poisson distributed within the intervals between the archaeological or geological events. In this program the arcaheological or geological events are denoted by the term Boundary. Ordinary events are put into the model useing any of the date specification commands or the generic term Event which has no dating information associated with it.

The boundaries will then divide the events up into a series of phases (in the most general sense - the events could be ordered within this as a sequence). We would like our prior density to be independent of the number of dated events within each phase, and, ideally the overall start and finish to be independent of the number of postulated internal Boundaries. All that follows is a result of these criteria.

The statistical weight of a single phase with a starting boundary b_i, a final boundary b_i+1 and ni events within it, is proportional to:

For this reason a prior probability is applied which is proportional to:

Looking at this in another way, for any boundary b_i there is a preceding phase with n_(i-1) items (starting with the boundary b_(i-1)) and one following with n_i items (ended by another boundary b_(i+1)). A prior probability function f(t) can then be calculated for the position of b_i:

This is the method outlined by Buck et al 1992.

Additional factors for unform overall span

Hovever this is not the end of the story. In version 3.2 or later of this program if the 'Uniform span prior' option is set, two further functions are added to the prior in these cases.

The first of these is needed because we do not wish the prior for the length of overall sequence of events to depend on the number of boundaries in the model. For this reason the prior is made proportional to:

where b_1 is the first boundary and b_m is the last (ie there are m boundaries in total).

The second addition takes account of the fact that if there is an upper and lower limit independantly applied to the boundaries in general we need to take account of the fact that this will 'favour' shorter spans. This effect can be reversed by applying a prior factor of:

where b_llim is the lower limit for the boundaries and b_ulim is the upper limit.

Together all of these factors give a uniform prior density for the span of the entire sequence of boundaries.

Nesting of structures

The program takes account of the depth of nesting of boundaries. All boundaries to be treated as comparable should be at the same nesting depth. This enables events of different class to be treated in a way which makes sense. A boundary at an outer leve is treated as an uper or lower limit as applicable. A sequence of boundaries at an inner level is treated like two events (the start and end of this sequence).

Having boundaries at too many different levels is liable to make the convergence very slow.

TAQ and TPQ

With these termini - it can be ambiguous whether the events are within a particular phase or not. The decision as to whether they should be included is made dynamically. This method is not entirely rigorous but it does achieve the desired effect of ensuring that the number of events specified within a TAQ or TPQ will not in itself greatly affect the prior probabilities for the boundaries.

Additional Information

One benefit of the MCMC sampling method is that it is very easy to find out additional information after each iteration of the system.

First and last events

If requests have been made for First (or Last) a distribution is built up which contains the first (or last) sampled value within a group after each iteration.

Note that this is not the same as the estimate of a phase boundary assuming a model of uniform deposition of dated material.

Spans intervals differences and shifts

The requests Span, Interval and Difference all compare two sampled values and then generate a distribution for the difference between them. In the case of Span the first and last samples from the group are found and the difference between these calculated. Interval does the same thing for subsequent members of the group and Difference for specified items. Shift builds a distribution for the addition of the samples for two items.

Again note that the span calculated in this way does not make any assumptions about the deposition rate and will tend to give results which are too high unless the phase is properly constrained.

Correlations

The request for a correlation plot (Correlate) builds a two dimensional histogram of the samples from the two requested items.

Ordering

The order command (Order) simply keeps track of the order of all of the elements after each iteration of the MCMC Sampling process. The probability of these orders is thereby determined (only the 50 most likely orders are stored).

Probabilities

The MCMC sampling method can be used to generate probabilities for certain propositions. For example, if we question whether a distribution should lie at a particular position in a sequence (say between the elements i and i+1) we can perform the sampling process for the questioned item giving values t_q and for each iteration we can ask whether this is true:

t_i < t_q < t_(i+1)

When all of the iterations have been completed we have a total number of iterations n and the number of times the above constraints were obeyed n_TRUE and so the probability can be calculated:

p = n_TRUE/n