OxCal > Analysis > Information

Specification of Information

Parameter types

The parameters used in models can be of three different types:

The program attempts to keep track of types through the calculation process. The functions Number() and Date() can be used to force parameters to be of a specific type.

Definition of time-scale

Because of the many different applications of OxCal, it is important that there is a well defined internal time-scale. For this purpose cal BP is not suitable as it is an integer time-scale and refers only to whole years - not to specific dates and times. The internal time-scale in OxCal is therefore based on the Gregorian calendar:

This gives a continuous real number time-scale (G), which can be directly related to BC/AD dates. Note however that G-1 corresponds to the start of 2 BC and G1 corresponds to the start of AD 1, since there is no year zero in the BC/AD scheme.

Given the widespread use of cal BP as a time-scale, this is defined here as a real time-scale from the middle of AD 1950 (i.e. G1950.5 using the internal date scale).

Special pre-processor functions are included for entering mid-year dates (see pre-processor calculations). The following examples show their use:

Further details and conversions between this format and other time-scales is given in the section on the calendar definitions.

On plots axes OxCal shows the start of the year. When reporting in BC/AD the transition between 1BC and 1AD is shown as '1BC/1AD'. The spacing between the labels is regular. Thus the axis on a plot with labels every century will show:

     .     .     .     .     .     .     .
    300   200   100 1BC/1AD 101   201   301

For a one year spacing in the labels the axis is shown as:

     .     .     .     .     .     .     .
     3     2     1  1BC/1AD  2     3     4

which effectively means:

     .     .     .     .     .     .     .
     | 3BC | 2BC | 1BC | AD1 | AD2 | AD3 |     

If the same were plotted using fractional Gregorian years (±CE or OxCal internal format) it would be shown as:

     .     .     .     .     .     .     .
    -2    -1     0     1     2     3     4

and when reporting the same dates as BP the centre of the year is given - thus the equivalent axis would be:

        .     .     .     .     .     .   
       1952  1951  1950  1949  1948  1947

When reporting dates and ranges in the BC/AD or BCE/CE format OxCal will give the year in which the event occurs (without a year zero). If ISO-8601 is chosen all dates will be reported CE but a year zero will be used (as is defined in this standard and as is the practice in astronomy) and negative year numbers before that. If the Gregorian year format is chosen fractional years are given in the internal format described above. All files and data-sets use the fractional format.

For most purposes involving radiocarbon this is irrelevant but future versions of OxCal may give alternative date format outputs and they will be based on the definitions laid out here and in the section on calendar definitions.


Parameters can be introduced into models by assigning them a parameter name. The main functions that are used are:

The type function Number() allows a general numerical parameter to be defined by an expression. This is the equivalent of the Date() function described in the next section. The other functions given here allow parameters to be assigned specific probability distribution functions, usually to describe the likelihood for specific functions: the N() function for normal distributions, the U() and Top_Hat() functions for uniform distributions and P() for more general functions. Finally Prior() allows distributions to be defined numerically using some prior information.

For any of the methods described here, or below, two different formats of parameter definition are allowed. OxCal will also try to work out the type from the context if you do not specify Date() or Number(). The following are all equivalent:


The format based on the equality operator is better suited to complex mathematical calculations but the parameter names have to be simple strings and cannot contain operators like + - / * ( ). Examples of parameter definitions are:

// Simple numbers
n1=200;                  n2=166.66*1.2;

// Normal likelihood with a mean of 200 and a sigma of 20
a1=N(200,20);            N("a2",200,20); 

// Uniform likelihood between 180 and 200
b1=U(180,220);           U("b2",180,220); 
b3=Top_Hat(200,20);      Top_Hat("b4",200,20); 

// Poisson distribution with a mean of 10
p1=Pois(10);             Pois("p2",10);

// Poisson distribution but scaled by a factor of 3
p3=Pois(10,3);           Pois("p4",10,3);

// log-normal distribution with a mean of about 1000
l1=LnN(ln(1000),ln(1.1));	LnN("l2",ln(1000),ln(1.1));

// Exponentially falling likelihood with time constant 10
d1=P(0,100,exp(-d1/10)); P("d2",0,100,exp(-d2/10)); 

// Rapidly falling distribution expressed as an inline array

The reason that you might wish to introduce general parameters of this form is that they can be used in subsequent calculations (see operations on probability distributions).

Maths ↓

The parameters of any Bayesian model are all treated in a similar way. Each parameter has a value ti and might also have observations or other information about it which are conceptually denoted as yi. Each parameter introduced is assumed to have a uniform uninformed prior p(ti). Where there is other information, this can be used to define a likelihood function p(yi|ti) for that parameter.

The functions N(), U(), Top_Hat(), P() and Prior(), can be used to directly assign such likelihoods:

Function Likelihood
N(ri,si) p(yi|ti) ∝ (1/(si√(2π))) exp(-(ti - ri)2/(2 si2))
U(ri,si) p(yi|ti) ∝ H(ti-ri)H(si-ti)
Top_Hat(ri,si) p(yi|ti) ∝ H(ti-ri+si)H(ri+si-ti)

where H(x) is the Heaviside step function which is 0 when x<0, 1/2 when x=0 and 1 when x>0. The P() function defines a likelihood function directly, either as an expression, or as a literal array. The Prior() function can be used to provide such a function in numerical form as saved in a file. Usually the Prior() function is used to define a non-uniform prior but it can equally be used to provide a particular functional form for a likelihood where the prior is defined in some other way. Assuming no other prior information, a likelihood can be viewed as an informed prior, since for constant p(t):

p(ti|yi) ∝ p(yi|ti)p(ti) ∝ p(yi|ti)

Date functions

Special functions are provided for specifying information about the time of events - these have the type of Date. The main functions currently implemented are:

The type function Date() is used to specify a date - the internal date format should be used (see calendar definition) for numerical values; the likelihood functions described above can also be used. The Age() also performs the same task but is for expressing dates before some specific year. That year is assumed to be the middle of AD1950 (G1950.5) if no other value is given. Using the Year= statement you can set this to some other date (again using the format described in calendar definition) either for a specific function or on a more general basis.

The R_Date(), R_Simulate() and R_Combine() functions will be covered in the next section.

C_Date() is a legacy from previous versions of OxCal and specifies a normally distributed likelihood; depending on the options set the C_Date() function can use either BP or the internal Gregorian fractional years (G). C_Simulate() can be used for simulating dates for a method that produces normally distributed likelihoods. The Sapwood() function will be dealt with below.

The function C_Combine() can be used to directly combine dates of the form C_Date() or C_Simulate().

The following examples illustrate the range of different methods of expressing dates.

    a1=Date(1066.5);       a2=Date(AD(1066));        a3=Age(884);       a4=Age(934){Year=AD(2000);};
    b1=Date(N(1066.5,10)); b2=Date(N(AD(1066),10));  b3=Age(N(884,10)); b4=C_Date(AD(1066),10);
    c1=Date(U(AD(1066),AD(1093))); Date("c2",U(AD(1066),AD(1093)));
    d2=C_Simulate(AD(1066),10);    d3=Date(N(AD(1066)+randN()*10,10));

Maths ↓

Date parameters are treated in exactly the same way as any other parameters and mathematically the Date() and Number() functions are identical. The Age() function assumes that any value or likelihood distribution relates to the time before present (defined by the Year attribute), Y.

The following show the effect of the Age() function on the likelihood:

Function Likelihood
Age(N(ri,si)) p(yi|ti) ∝ (1/(si√(2π))) exp(-(Y-ti - ri)2/(2 si2))
Age(U(ri,si)) p(yi|ti) ∝ H(Y-ti-ri)H(si-Y+ti)
Age(Top_Hat(ri,si)) p(yi|ti) ∝ H(Y-ti-ri+si)H(ri+si-Y+ti)

The function C_Simulate() calculates a likelihood as:

Function Likelihood
C_Simulate(ri,si) p(yi|ti) ∝ (1/(si√(2π))) exp(-(ti - ri-ε)2/(2 si2))
where ε is sampled from N(0,si)

The function C_Combine() function evaluates a combined error weighted mean and standard error following the method of Ward and Wilson 1978. If we have several determinations ri with standard errors si we calculate a combined determination rc with standard error sc:

rc = (Σ ri/si2)/(Σ 1/si2)
sc = (Σ 1/si2)-1/2
T = Σ (ri-rc)2/si2

The likelihood distribution is then as for the C_Date() function. The T value has a χ2 distribution on n-1 degrees of freedom as discussed in Ward and Wilson 1978.

If there is an additional component of uncertainty sa, this is added as:

sc = √[(Σ 1/si2)-1+sa2]

Radiocarbon calibration

The likelihood distribution for calibrated radiocarbon dates is more complicated than that for most dating methods because it is based on several different types of information:

OxCal provides functions for dealing with all of these scenarios. The functions are:

Calibration likelihoods

The R_Date() function provides the normal calibration against the default curve.

R_F14C() allows you to enter the radiocarbon concentration in terms of F14C as defined by Reimer et al. 2004. R_Simulate allows you to sample the date you would expect to get from a radiocarbon lab for a sample of a particular date; the 'Cal Date' parameter is either in Gregorian fractional years (default) or cal BP if that option is set.

R_Combine() allows you to combine dates and add in an additional component of systematic uncertainty (for example if you are dealing with short-lived material you may wish to add an additional uncertainty of about 8 14C years - see Stuiver et al. 1998). A χ2 test (Ward and Wilson 1978) will also be performed.

  R_Combine("Jar 1a")
  R_Simulate("Mid AD10",AD(10),20);

Here is an example of the use of the functions for post-bomb calibration.

  R_Combine("Jar 1b")
  R_Simulate("Mid 1980",AD(1980),20);

Maths ↓

If the curve function is defined as a radiocarbon concentration r(t) ± s(t) then the calibrated likelihood distribution for a radiocarbon determination of ri ± si is proportional to:

p(yi|ti) ∝ exp[-(ri-r(ti))2/(2(si2+s2(ti)))]/√(si2+s2(ti))

In practice this calculation is performed at increments defined by the resolution (which is every five years by default). If we have a ΔR defined to be rd ± sd the the likelihood for the reservoir offset is given by:

p(yd|td) ∝ exp[-(rd-td)2/(2 sd2))]/sd

and the likelihood for the calibrated date ri ± si is given by:

p(yi|ti) ∝ exp[-(ri-r(ti)-td)2/(2(si2+s2(ti)))]/√(si2+s2(ti))

This is the likelihood that is used in OxCal during all MCMC analysis in line with the recommendations of Jones and Nicholls 2002. If the ΔR applies to only one calibration we can combine the two distributions to arrive at:

p(yi|ti,yd) ∝ exp[-(ri-r(ti)-rd)2/(2(si2+s2(ti)+sd2))]/√(si2+s2(ti)+sd2)

This is the distribution calculated in the calculation stage of the program, and if no MCMC analysis is required.

The function R_Simulate(ri,si) calculates a likelihood as:

p(yi|ti) ∝ exp[-(r(ri)-r(ti)-ε)2/(2(si2+s2(ti)))]/√(si2+s2(ti))
where ε is sampled from N(0,si)

The function R_Combine() function evaluates a combined error weighted mean and standard error following the method of Ward and Wilson 1978. If we have several determinations ri with standard errors si we calculate a combined determination rc with standard error sc:

rc = (Σ ri/si2)/(Σ 1/si2)
sc = (Σ 1/si2)-1/2
T = Σ (ri-rc)2/si2

The likelihood distribution is then as for the R_Date() function. The T value has a χ2 distribution on n-1 degrees of freedom as discussed in Ward and Wilson 1978.

If there is an additional component of uncertainty sa, this is added as:

sc = √[(Σ 1/si2)-1+sa2]

Calibration curves

The Curve() command allows you to specify the curve to use. See the section on Calibration curves for details of the data-sets. This can either be a calibration curve or a comparison data-set (in which case a comparison curve will be generated). Each point in the data-set has three values:

The default resolution is the same as the resolution of the IntCal curves (5 years) and so no interpolation or binning is needed. However, if the resolution is set to less than 5 the curve will be interpolated by a cubic (or linear if that option is set) function - the same is true for comparison data-sets where the resolution is usually coarser. If the data-points are closer together than the resolution (as is the case with much of the bomb data) the data points are binned together before the curve is generated.

The Delta_R() function is primarily intended for marine applications (with the marine curve). What it does is to offset the measurements before calibration. When used with Bayesian models the ΔR offset is treated as a parameter (common to all relevant samples) and with a Normal likelihood distribution. Given this, the function can also be used to allow for correlated uncertainties between samples. If the parameter associated with the Delta_R() function is d, then the rm is offset so that the likelihood is defined by R_Date(rm-d,sm).

The Mix_Curves() function allows two curves to be mixed together with an uncertainty in the mixing ratio. This allows for mixed reservoir dating. In principle this function can be used to mix mixed curves allowing any number of reservoirs to be mixed together. The use of this function always results in an MCMC analysis - in this the proportion for the second curve is not allowed to go out of the range 0-100% (even if for example the proportion is set at 10±10%).

Note that all calculations are performed in terms of radiocarbon concentration (rather than BP date). This includes calibration. If you wish to override this you can do this by setting the option 'Use F14C space' to off. This only affects the calibration itself all other calculations, such as reservoir mixing etc will be performed in terms of radiocarbon concentration F14C.

For most purposes it is easiest to use the curve dialogue ([Options > Curve] for single calibrations or [Tools > Curves] for the main input window) to set the required curve. The following show some examples of use of these functions:

    Delta_R("Local Marine",200,30);
    Curve("Decadal Turnover","intcal20.14c")
    Mix_Curves("Mixed","Atmospheric","Local Marine",40,10);

For MCMC analysis a non-normal distribution can be used for Delta_R() in place of the usual mean and standard deviation. For example, to allow ΔR to take a value anywhere between 0 and 800 you could use the command:


The same is true for Mix_Curves() and so, for example, to allow for any mixture of marine and atmospheric diet you could use:

    Mix_Curves("Mixed","Atmospheric","Local Marine",U(0,100));

These uniform priors allow you to use unbiased priors for ΔR and diet allowing the model to find an unbiased estimate for these parameters.

Maths ↓

Curve construction

Where binning is required the method is as follows:

We need to take account of the fact theat there is often much real natural variability not reflected in the measurement uncertainty. We have the individual measurements rij with measurement uncertainties sij. The error weighted estimate of the mean is:

μ'i=(Σj rij/sij2)/(Σj (1/sij2))

And the average weighted variance (Bevington and Robinson 1992) is:

σi2= [(Σj rij2/sij2)/(Σj (1/sij2)) - μ'i2] [ni/(ni-1)]

The uncertainty of the mean is:

σμi2 = 1/(Σj (1/sij2))

However, given that we do not in general know if the uncertainty in the underlying measurement or reported points in the curve is correlated, it is not safe to use this. This is especially true where the points are not raw data-points but the points of a compiled calibration curve. It is safer to assume that the uncertainty is never lower than the lowest uncertainty in the points within the bin:

σmin i2 = minj(sij2)

So we take the variance si2 in the bin to be the larger of σmin i2 and σi2. Note that prior to v4.4, where the calibration curve was only supplied at 5 year increments (the default resolution), the minimum of σμi2 and σi2 was used, this was not suitable for IntCal20 which was reported at 1 year increments. Overall for each bin we have three values which are:

ti=(Σj tij/sij2)/(Σj (1/sij2))
ri=(Σj rij/sij2)/(Σj (1/sij2))
si= √max([(Σj rij2/sij2)/(Σj (1/sij2)) - ri2]/[ni/(ni-1)] , minj(sij2))

The revised data-set is then interpolated onto the required points as described below.

If the data-points are more widely spread than the resolution requires, or they do not lie on the correct points, interpolation is performed. If linear interpolation is chosen (or if the interpolation is only for one point) then the function r(t) between two points (ti,ri,si) and (ti+1,ri+1,si+1) is given by:

ai = (ri+1 - ri)/(ti+1-ti)
r(t) = ti + ai(t-ti)

If cubic interpolation is used then the function is defined to pass through all data points and have a continuous value and first derivative. The interpolation is given by:

ai = (ri+1 - ri-1)/(ti+1-ti-1)
ai+1 = (ri+2 - ri)/(ti+2-ti)
δri = (ri+1 - ri)
δti = (ti+1 - ti)
ci = (3 δri - δti(2ai + ai+1))/δti2
di = ((ai+1 - ai) - (2 ci δti))/(3 δti2)
r(t) = ti + ai(t-ti) + bi(t-ti)2+ ci(t-ti)3

s(t) is interpolated in exactly the same way.

Curve mixing

The mixing ratio is treated as a parameter tmin Bayesian models. Where the mixing ratio is defined as rm ± sm this parameter is given a likelihood which is:

p(ym|tm) ∝ H(tm)H(1-tm)exp(-(tm-rm)2/(2sm2))

in other words it is a normal distribution truncated at 0 and 1. A posterior distribution for the mixing ratio will be generated which is based on the global model. The combined curve is given by:

r'(t) = (1-tm) r1(t) + tm r2(t)
s'(t) = (1-tm) s1(t) + tm s2(t)

Curve reservoir

If a reservoir time constant τ is set it is assumed that the reservoir contains carbon from a range of ages exponentially distributed with an average age of τ. If we assume that the atmospheric radiocarbon curve is given by r(t) and the reservoir curve by r'(t), we can see that:

r'(t) = (1/τ) ∫t r(u) exp(-(t-u)/τ) du

We also define the same relationship for the uncertainty:

s'(t) = (1/τ) ∫t s(u) exp(-(t-u)/τ) du

By numerical integration it is possible to calculate these functions. We assume a linear extrapolation of the curve to arrive at a starting point but with errors that are ten times higher than the first point in the curve. This approximation means that the resultant curve should not be relied upon within a few time constants of the start of the curve r(t).

Where there is an uncertainty in τ of δτ the additional uncertainty is calculated by numerically determining dr'(t)/dτ.

In order to give a representative result even when the errors are non-gaussian (as is the case with long time constants in the post-bomb period) we numerically evaluate the greater of:

dr'(t)/dτ ≈ [r'(t,τ+δτ)-r'(t,τ-δτ)]/(2δτ)
dr'(t)/dτ ≈ [r'(t,τ)-r'(t,τ-δτ)]/δτ
dr'(t)/dτ ≈ [r'(t,τ+δτ)-r'(t,τ)]/δτ

and r''(t) is found by averaging:

r''(t) ≈ [r'(t,τ+δτ) + r'(t,τ) + r'(t,τ-δτ)]/3

and adding the additional uncertainty in quadrature to ds'(t) to give a revised curve r''(t) and s''(t) such that:

s''(t)= √((s'(t))2 + (δτdr'(t)/dτ)2)

Note that this a slightly different way of calculating the uncertainty than in versions of OxCal prior to version 4. The differences are significant where the curve has rapid changes (such as with the post-1950 calibration curves).

Sapwood estimates for dendrochronology

Methods are included in OxCal for the estimation of the number of sapwood rings (S) present in a tree where the heartwood/sapwood boundary is present but where the bark edge is not. The underlying model is based on research reported in the thesis of Dan Miles (Miles 2005). The factors on which the estimate are based are:

You can find a linear dependency of ln(S) on ln(M) and ln(R). Roughly speaking S increases with increasing R and S decreases with increasing M. The sapwood tool enables you to perform a linear regression to define a model for a region.

Calculating sapwood estimates

Once a model has been worked out, this can be used to estimate sapwood for wood samples where the number of sapwood rings are unknown, but where we still retain the heartwood/sapwood transition. The two OxCal commands required for this are:

The Sapwood_Model() function defines the parameters to be used for the model. The Sapwood function takes four parameters after the optional name. These are:

The program then calculates a likelihood distribution based on the information provided.

Dan Miles has calculated a model for post-Roman mainland Britain (Miles 2005), excluding Scotland, which can be used for dating oak in this area and period. It can be entered as in the example below:

  Sapwood_Model("Mainland Britain", 2.77292, 0.100001, -0.275445,0.314286377);
  Sapwood("wa21", 1329, 243, 0, 1.06);
  Sapwood("wa22", 1354, 58, 6, 2.74);

The resolution of the program (found in [Tools > Options]) should be set to 1 for this sort of analysis as five years (the default) is too coarse.

Maths ↓

We define variables based on the logarithms of S, M and R:

s=ln(S), m=ln(M) and r=ln(R)

You can find a linear dependency of s on m and r. The residuals are normally distributed in s. This linear regression can be performed using the sapwood tool provided with this package, or any statistical package. The likelihood for S is given by:

p(yi|Si,a,br,bm) ∝ H(S) exp(-(a + brln(Ri) + bmln(Mi) - ln(Si))2/(2σ2))/Si

or if we define qi as the heartwood/sapwood boundary date for a sample, the likelihood for the felling date is given by;

p(yi|ti,a,br,bm) ∝ H(ti-qi) exp(-(a + brln(Ri) + bmln(Mi) - ln(ti-qi))2/(2σ2))/(ti-qi)

Parameterisation of date information

Specific functions are provided for radiocarbon dating and sapwood estimates because the functional forms of the likelihood distributions are unusual. For many dating methods, however normally distributed errors are appropriate that the generic date functions can be used.

For some methods however it is worth introducing extra common parameters to ensure that correlated uncertainties are correctly treated in any Bayesian analysis. An obvious example of this kind is Luminescence dating.

If we consider a simple case we might have four samples and two environmental dose rate readings. we might also estimate that the dose in the past was lower due to variations in water content.

We then introduce seven parameters to cover this situation:

We can put all of this into a model within OxCal in the following way:

  // define the independent parameters and set the likelihoods
  // calculate the dependent parameters

This shows how the general parameters can be used to define chronological information.

Note that the independent variables (in this case DE1...DE4, R1, R2 and F) all have uniform priors. The dependent parameters will not necessarily have uniform priors - though in this particular case this will not be significant unless the uncertainties are very large.

Cross referencing

Versions of OxCal previous to v4, provided the XReference function which had limited capabilities, allowing an event to be in two phases. There are now much more general ways of dealing with cross references.

The previous section gives an example of a parameter being used explicitly more that once in a model. However, there are times when you need to be able to specify cross references in other ways. You might, for example wish to define that two events in a model are synchronous, or to use the same ΔR parameter in two different places within an analysis as in:

    Delta_R("Region 1",200,30);
    Delta_R("Region 2",100,30);
    Delta_R("=Region 1");
    Delta_R("=Region 2");

The general principle for cross referencing parameters is that the parameter should have one primary specification and then as many cross references to it as you wish. As in the example above the cross references are denoted by using the same name but starting with the '=' character. You can also use the &= operator as in the final example in this section. The following example shows one way of specifying a model for two phases that both start at the same time, but end at different times:

    // first instance of the parameter "Start 1"
    Boundary("Start 1");
     R_Date("a", 2000, 20);
     R_Date("b", 2010, 20);
     R_Date("c", 1920, 20);
     R_Date("d", 1900, 20);
     R_Date("e", 1903, 20);
    Boundary("End 1");
    // cross reference to the parameter "Start 1"
    Boundary("=Start 1");
     R_Date("f", 2040, 20);
     R_Date("g", 2020, 20);
     R_Date("h", 1960, 20);
     R_Date("i", 2000, 20);
     R_Date("j", 1933, 20);
    Boundary("End 2");

The following is equivalent (see the use of the s1=Boundary() to define the parameter and then the expression s1&=Boundary() to cross reference to it):

  // define the likelihoods
  a=R_Date(2000, 20);
  b=R_Date(2010, 20);
  c=R_Date(1920, 20);
  d=R_Date(1900, 20);
  e=R_Date(1903, 20);
  f=R_Date(2040, 20);
  g=R_Date(2020, 20);
  h=R_Date(1960, 20);
  i=R_Date(2000, 20);
  j=R_Date(1933, 20);
  // define the model
  (s1=Boundary()) < (a | b | c | d | e) < (e1=Boundary());
  (s1&=Boundary()) < (f | g | h | i | j) < (e2=Boundary());

By using cross referencing you ensure that only one independent parameter is created in a Bayesian model which ensures that correlated uncertainties are treated correctly. In some instances the program will generate a second dependent parameter which is equal to the first one.