Suscripción a Biblioteca: Guest
Portal Digitalde Biblioteca Digital eLibros Revistas Referencias y Libros de Ponencias Colecciones
International Journal for Uncertainty Quantification
Factor de Impacto: 4.911 Factor de Impacto de 5 años: 3.179 SJR: 1.008 SNIP: 0.983 CiteScore™: 5.2

ISSN Imprimir: 2152-5080
ISSN En Línea: 2152-5099

Acceso abierto

International Journal for Uncertainty Quantification

DOI: 10.1615/Int.J.UncertaintyQuantification.v1.i1.10
pages 1-17


Ville Kolehmainen

Department of Physics and Mathematics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, Finland

Tanja Tarvainen

Department of Physics and Mathematics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, Finland

Simon R. Arridge

Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK

Jari P. Kaipio

Department of Physics and Mathematics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, Finland; Department of Mathematics, University of Auckland, Private Bag 92019, Auckland Mail Centre, Auckland 1142, New Zealand


With inverse problems there are often several unknown distributed parameters of which only one may be of interest. Since assigning incorrect fixed values to the uninteresting parameters usually leads to a severely erroneous model, one is forced to estimate all distributed parameters simultaneously. This may increase the computational complexity of the problem significantly. In the Bayesian framework, all unknowns are generally treated as random variables and estimated simultaneously and all uncertainties can be modeled systematically. Recently, the approximation error approach has been proposed for handling uncertainty and model-reduction-related errors in the models. In this approach approximate marginalization of these errors is carried out before the estimation of the interesting variables. In this paper we discuss the adaptation of the approximation error approach to the marginalization of uninteresting distributed parameters. As an example, we consider the marginalization of scattering coefficient in diffuse optical tomography.

KEYWORDS: inverse problems, Bayesian inference, parameter estimation, spatial uncertainty, diffuse optical tomography

1. Introduction

There are several inverse problems in which there are many unknown distributed parameters. Often, only one or a subset of the unknowns is of main interest. For example, in hydrogeophysics the unknown distributed parameters may include permittivity, capillarity, and diffusivity [1, 2]. In diffuse optical tomography (DOT), the most important distributed parameters are the scattering and absorption coefficients [3]. Of these, at least in biomedical applications, the absorption coefficFent is the one of interest since it is related to the oxygenization level of tissues [4]. In these applications, the scattering coefficient is considered as a nuisance parameter. The scattering coefficient, however, has to be estimated simultaneously due to the so-called crosstalk of the coefficients [3].

The simultaneous estimation of two distributed parameters is naturally a more unstable problem than estimating either of these if the other were known. In addition, with nonlinear problems the convergence of algorithms is a further problem. With applications which are eventually meant to be almost real time ones, such as biomedical optical tomography, it is of major interest to reduce the computational times as much as possible. Thus, in addition to estimating the interesting parameters only, there is usually pressure to also use otherwise heavily reduced computational models.

The approximation error approach was introduced in [5, 6] originally to handle pure model reduction errors. For example, in electrical impedance (resistance) tomography (EIT, ERT) and deconvolution problems, it was shown that significant model reduction is possible without essentially sacrificing the quality of estimates. With EIT, for example, this means that very low dimensional finite element approximations can be used. Later, the approach was also applied to handle other kinds of approximation and modeling errors as well as other inverse problems. Model reduction, domain truncation, and unknown anisotropy structures in diffuse optical tomography were treated in [7-10]. Missing boundary data in the case of image processing and geophysical ERT/EIT were considered in [11] and [12], respectively. In [13-15] the problem of recovery from simultaneous geometry errors and model reduction was found to be possible.

The approximation error approach was extended to nonstationary inverse problems in [16] in which linear nonstationary (heat transfer) problems were considered, and in [17] and [18] in which nonlinear problems and state space identification problems were considered, respectively. The earliest similar but partial treatment within the framework of nonstationary inverse problems was considered in [19], in which the the boundary data that is related to stochastic convection diffusion models was partially unknown. A modification in which the approximation error statistics can be updated with accumulating information was proposed in [20] and an application to hydrogeophysical monitoring in [21].

From pure model reduction and unknown (nondistributed) parameters or boundary data, a step forward was recently considered in [22] in which the physical forward model itself was replaced with a (computationally) much simpler model. In [22], the radiative transfer model (Boltzmann transfer equation), which is considered to be the most accurate model for light transfer in (turbid) media, was replaced with the diffusion approximation. It was found that also in this kind of case, the statistical structure of the approximation errors enabled the use of a significantly less complex model, again simultaneously with significant model reduction for the diffusion approximation. But also here, both the absorption and scattering coefficients were estimated simultaneously.

The approximation error approach relies on the Bayesian framework of inverse problems, in which all unknowns are explicitly modeled as random variables [5, 23, 24]. The uncertainty in the unknowns is given in the models and measurements is reflected in the posterior (probability) distribution. In the Bayesian framework, all unknowns are subject to inference simultaneously, which often results in excessively heavy computational loads. Generally, Markov chain Monte Carlo algorithms have to be used to obtain a representative set of samples from the posterior distribution. Then, after a set of samples has been computed, marginalization over the uninteresting unknowns is trivial. Only in a few special but important cases, such as the additive error model, some of the uninteresting unknowns can be eliminated before inference. We refer to such elimination as premarginalization.

In the present paper we consider the approximation error approach in the context of approximate premarginalization of uninteresting distributed parameters. Furthermore, we also consider the simultaneous treatment of the errors that are related to model reduction. As a computational example, we consider the approximate premarginalization of the scattering coefficient in diffuse optical tomography. This example shows that at least in this case it is possible to premarginalize over one distributed parameter and successfully estimate another.

The rest of the paper is structured as follows. In Section 2 we give a brief account of the approximation error approach and its formulation for the case of several distributed parameters. In Section 3 we describe the diffuse optical tomography problem. In Section 4, numerical examples of reconstructing the scattering coefficient in optical tomography with different degrees of severity are treated.

2. Approximation error approach

In the Bayesian framework for inverse problems, all unknowns are treated and modeled as random variables [5, 23, 24]. Once the probabilistic models for the unknowns and the measurement process have been constructed, the posterior distribution π(x|y) is accessed, which reflects the uncertainty of the interesting unknowns x given the measurements y. This distribution can then be explored to answer all questions which can be expressed in terms of probabilities. For general discussion of Bayesian inference (see, for example, [25, 26]).

Bayesian inverse problems are a special class of problems in Bayesian inference. Usually, the dimension of a feasible representation of the unknowns is significantly larger than the number of measurements. Thus, for example, a maximum likelihood estimate is impossible to compute. Even in cases in which the number of unknowns would be significantly smaller than the number of measurements, the structure of the forward problem is such that maximum likelihood estimates would still be unstable. In addition to the instability, the variances of the likelihood model are almost invariably much smaller than the variances of the prior models. The posterior density is often extremely narrow and, in addition, may be a nonlinear manifold.

2.1 Marginalization Over Additive Errors

In the approximation error approach, the modeling and other errors are treated as additive errors. Therefore, we review briefly how the additive errors are formally premarginalized [5]. Let the observation model be


where e are the additive errors and xA (x) is the deterministic forward model. With deterministic we mean that the model A does not contain any uncertainties or other model errors. Let the joint prior distribution of the unknowns x and e be π(x,e). Using the Bayes’ theorem repeatedly, we can decompose the joint distribution of all associated random variables as


In the case of the additive model (1), the conditional distribution π(y |x,e) is formally given by

which yields the likelihood distribution


and further,1 noting that once the measurements have been obtained, π(y) > 0 is a fixed normalization constant, we have the posterior distribution


In the quite common case of mutually independent x and e, we have πe|x(e|x) = πe(e). Furthermore, if e and x are normal, we can write π(e) = (e*e) and (x) = (x*x) and we have the familiar form


where LeTLe = Γe-1 and LxTLx = Γx-1, for the posterior distribution. In the above, the unknown (uninteresting) additive error e was premarginalized, that is, marginalized before the inference procedure, and is not present in (8) or (9).

2.2 Approximate Premarginalization Over Model Reduction Related Errors and Other Uncertainties

The problem that is generally related to uninteresting auxiliary unknowns ξ is that we usually cannot perform premarginalization such as in Eqs. (5) and (6). In most cases we have to estimate both x and ξ, which may be a considerably more demanding undertaking than estimating just x when ξ were known. For example, if a Markov chain Monte Carlo (MCMC) approach were used, the marginalization over ξ can only be done after running the chain for both x and ξ. Once this is carried out, however, the marginalization over ξ is trivial. For MCMC methods in general (see, for example, [27, 28]). For MCMC and inverse problems, see [29-31] for applications to EIT. In this section we discuss the computational procedure in more detail in the case in which there are two distributed parameters of which premarginalization over the other one is to be carried out.

Now let the unknowns be (x,z,ξ,e), where again e represents additive errors and ξ represents auxiliary uncertainties such as unknown boundary data, and (x,z) are two distributed parameters of which x is of interest only. The accurate forward model


is usually a nonlinear one. The uncertainties ξ can sometimes be modeled to be mutually dependent with (x,z), especially when ξ represents boundary data on the computational domain boundary and (x,z) are modeled as random fields. On the other hand, if ξ represents an unknown boundary shape, ξ might be modeled as mutually independent with (x,z). In the following we consider the case in which the noise e is additive and the unknowns (x,z,ξ) are not necessarily mutually independent.


denote an accurate model for the relation between the measurements and the unknowns,2 and let e be mutually independent with (x,z,ξ).

In the following we approximate the accurate representation of the primary unknown x by x = Px, where P is typically a projection operator. Let π(x,z,ξ,e) be a feasible model for the joint distribution of the unknowns. We identify x = Px with its coordinates in the associated basis when applicable.

In the approximation error approach, we proceed as follows. Instead of using the accurate forward model (x,z,ξ) A (x,z,ξ) with (x,z,ξ) as the unknowns, we fix the random variables (z,ξ)←(z00) and use a computationally (possibly drastically reduced) approximative model

Thus, we write the measurement model in the form


where we define the approximation error ε = φ(x,z,ξ) = A (x,z,ξ) - A(x,z00). Thus, the approximation error is the discrepancy of predictions of the measurements (given the unknowns) when using the accurate model A (x,z,ξ) and the approximate model A(x,z00). Note that (13) is exact.

Formally, after the models A and A are fixed, we have π(ε|x,z,ξ) = δ[ε - φ(x,z,ξ)]. We will later, however, employ approximative joint distributions and therefore consider π(ε,x,z,ξ) without any special structure. As the first approximation, we approximate φ(x,z,ξ) ≈ φ(Px,z,ξ) and thus π(ε|x,z,ξ) ≈ π(ε|Px,z,ξ). This means that we assume that the model predictions and thus the approximation error is essentially the same for x as x = Px. This assumption holds for inverse problems in general and for such projections in particular.

Proceeding as in Section 2.1, we use the Bayes’ formula repeatedly



since e and x are mutually independent, and (14) is a convolution integral with respect to ε. In particular, since e is mutually independent with (x,z,ξ), e and ε are also mutually independent.

At this stage, in the approximation error approach, both πe and πε|x are approximated with normal distributions. Let the normal approximation for the joint density π(ε,x) be


Thus we write



Define the normal random variable ν so that3 ν|x = e + ε|x



Thus, we obtain for the approximate likelihood distribution

Since we are after computational efficiency, a normal approximation for the prior model is also conventionally used:

Thus, we obtain the approximation for the posterior distribution

where V (x) is the posterior potential


where Γν |x-1 = Lν|xTLν|x and Γx-1 = LxTLx.

2.3 Computational Considerations

In Section 2.2, we wrote the normal approximation (15) for the joint distribution of (x,ε). Generally, this approximation is done to make an efficient computation of the maximum a posteriori (MAP) estimate feasible. If the actual prior model is normal, the marginal distribution of x induced by (15) coincides with the actual prior model. The prior model π(x,z,ξ) does not, however, have to be jointly normal and neither, in particular, does the marginal prior model π(x). In practice, whatever the prior model π(x,z,ξ) is, a set of samples (x(ℓ),z(ℓ)(ℓ)) is usually to be drawn and the approximation errors

are then to be computed, where nsamp is the number of draws. The normal approximation for π(ε,x) is then formed by setting x(ℓ) = Px(ℓ) and computing the mean and joint covariance as sample averages over the ensemble.

In the enhanced error model, one neglects the cross covariance and sets Γεx = 0 (see, for example, [5]). With the enhanced error model and nonlinear forward problems, we need to estimate the covariance Γε practically always by simulations and sample averages. If the prior model π(x) is Gaussian, however, the covariance Γx is available in the first place and in principle, would, not have be computed as a sample average.

Irrespective of what the (original) prior model for the primary unknown is, we note the following: When the cross covariances Γεx are employed, the sample average has to be used in practice also for Γx. Although the prior model covariance Γxx would yield a “better estimate” for the covariance of the reduced order covariance Γx than a sample covariance, we might be forced to use the latter because when the sample covariance is computed, the term Γεε - ΓεxΓx-1Γxε is guaranteed to be non-negative definite. If, in addition, Γe has full rank, Γν|x is guaranteed to be positive definite and the Cholesky factor Lν|x exists. But if the prior model covariance Γx = PΓxxPT is used, this condition is not generally met. By the law of large numbers, the condition is met asymptotically but it is impossible to specify a safe sample size. From the point of view of numerical stability, this is a problem especially when the approximation errors clearly dominate the additive errors, that is, the case for which the approximation error approach is targeted in the first place. It is thus advised to use the sample covariance estimate also for Γx.

It is difficult to predict how many draws are needed to compute the mean and joint covariance for (x,ε). Loosely speaking, this depends generally on (the covariances of) the model π(x,z,ξ) and the degree of nonlinearity of A . With relatively small covariances, few draws seem to be enough (see, for example, [12]). In the approximation error approach, the bottleneck is the computation of the solutions of the full accurate forward model A (x,z,ξ). As for using the full accurate forward model in the inversion with nonlinear problems, we, of course, have to compute this model along the iteration, but also typically compute the related Jacobian mapping. Thus, the overhead that is related to the computation of the approximation error statistics often corresponds to the computation of a few MAP estimates with the full accurate model.

3. Diffuse Optical Tomography

Diffuse optical tomography (DOT) is a noninvasive imaging modality in which images of the optical absorption and scattering within turbid media are derived based on measurements of near-infrared light on the surface of the body (for reviews see [3, 32]). The DOT problem is an exceptionally challenging inverse problem due to the diffuse nature of the forward model, and also since the measurements can span 10 orders of magnitude. Furthermore, there are several unknown distributed parameters involved, of which the absorption coefficient and the (reduced) scattering coefficient are usually reconstructed. Both coefficients are usually measured in mm-1.

The applications of DOT include the detection and classification of tumors from breast tissue, monitoring of infant brain tissue oxygenation level, and functional brain activation studies. For reviews on clinical applications (see [4, 33]). The absorption coefficient, which is related to the oxygenation level of blood, is usually the interesting parameter. In most applications, the scattering coefficient is considered a nuisance parameter. In this paper, our task is thus to employ the approximation error approach for approximate premarginalization over the inhomogeneous scattering coefficient z() and reconstruct only the inhomogeneous absorption coefficient x().

In the numerical examples below, we consider cases in which there are no auxiliary unknowns ξ of any type. Thus, we only have the uninteresting distributed parameter z and additive measurement noise to deal with.

3.1 Measurements in Optical Tomography

In the experimental setup of DOT, ms optical fibers are placed on the source positions (surface patches) Ωs,kΩ on the boundary of the body Ω. The measurements are obtained through md optical fibers that are placed in the detector positions (surface patches) Ωm,iΩ. A collection of measurements is formed by turning on the sources one at a time and measuring the light intensity at all measurement locations (for each source). The measurements in DOT may consist of direct intensity measurements, frequency-modulated amplitude and phase shift measurements, or time-resolved impulse response measurements.

In this paper, we consider the frequency domain measurement system. In the frequency domain measurements light from a sinusoidally modulated laser source is guided via the optic fibers to one of the source locations Ωs,k at a time, and the amplitudes and phase shifts of the transmitted light are measured on all the detector locations Ωm,i,i = 1,...,md. The measurement vector is thus ym with m = msmd. The inverse problem in DOT is to estimate a pair of functions (x,z), representing the (spatially inhomogeneous) optical absorption coefficient x() and the scattering coefficients z() of the tissues in Ω, given the measurements y and the forward model for the measurement process and noise.

3.2 Forward Model

In the context of inverse problems, the model xA(x) is referred to as the forward model. We consider DOT simulations in a diffusive regime where the body Ω consists of turbid, scattering dominated media. In such cases the light transport is commonly modeled with the diffusion approximation (DA) of the radiative transfer equation (RTE) [3]. The diffusion approximation is also used as transport model in this paper. For further details on the derivation and properties of the transport models and boundary conditions (see [3, 34, 35]).

We consider the frequency domain system below. Let the light source Ωs,k be on and Φk(,ω) be the induced photon density at , where ω is the modulation frequency of the light source. The frequency domain version of the diffusion approximation with the Robin boundary condition is of the form [3, 35]



where D (units [mm]) is the diffusion coefficient, c is the speed of light in the medium, ζ is a parameter that describes reflection on the boundary, ν is the outward normal vector at Ω, and gk(,ω) is the boundary source term for source at Ωs,k,


where I is the intensity of the source. The complex-valued flux ρi,k(ω) at the measurement site Ωm,i can be written as the surface integral


The collection {ρi,k}, k = 1,...,ms, i = 1,...,md is the raw data for the experiment. For numerical reasons, primarily for the range of measurements, transformed data and the correspondingly transformed forward model are typically used. Furthermore, the measurement systems are constructed according to these transformations. The experimental systems for frequency domain optical tomography export the log-amplitude and phase shift of the complex valued (demodulated) raw data. Thus, the measurements can be written


and the forward model is transformed accordingly. For different end uses, data with different frequencies ω may be acquired.

For the numerical realization of the diffusion approximation model (22)-(24), the finite element method (FEM) is typically used (see, for example, [3, 7]). In the FEM approximation, photon density is approximated in a finite dimensional basis as


where φi() are the nodal basis functions of the finite element mesh and Nn is the number of nodes in the FEM mesh, and h is mesh element size parameter.

The absorption and scattering coefficients are written as finite dimensional approximations


where χj denotes characteristic functions of disjoint elements in the reconstruction mesh. In the following, we identify the absorption and scattering coefficients and their representations as the coordinates

Thus, the solution of the forward problem amounts to the solution of ms complex valued Nn ×Nn systems of equations for one DOT experiment. The FEM-based forward model is thus of the form


where h refers to the discretization level parameter in (27).

3.3 Gaussian MRF Prior Model for Scattering and Absorption Coefficients

In the following, we model (x,z) as mutually independent. As the prior model for both π(x) and π(z), we used a proper Gaussian smoothness prior, constructed similarly as in [5, 7, 8]. In this construction, the distributed parameter, say x, is considered in the form

where xin() is a spatially inhomogeneous (absorption) coefficient4 with zero mean, and xbg() is a spatially homogeneous (background) absorption coefficient with nonzero mean. For the latter, we can write xbg() = q, where is a vector of ones and q is a scalar random variable with distribution q ~(x*bg,x2). With respect to the basis for x, we have the coordinates xinnp, np, and set xin ~(0,Γin). We model the spatial distributions xin and q as mutually independent, that is, the background is mutually independent with the inhomogeneities. An equivalent construction for z was considered.

Thus, we have the Γx = Γin,x + σbg,x2T, Γz = Γin,z + σbg,z2T and

In the construction of Γin,x and Γin,z, the approximate correlation lengths can be adjusted to match the size of the expected inhomogeneities. (See [5, 7, 8] for details.)

This prior model is a proper distribution, that is, the covariance exists. Traditional smoothness prior models are improper and samples cannot be drawn from such distributions. The approximation error approach, on the other hand, is based on computing the statistics of ε over the prior distribution. This is not possible with a prior of unbounded variances.

In [7] it was found that such a specific construction for the prior model for scattering and absorption coefficients was exceptionally suitable. This was the case even without the variable background. In [8], the prior described above with the variable background was shown to be feasible also for real data.

4. Numerical studies

We evaluate the approximate premarginalization by the approximation error approach with three two-dimensional numerical examples. In the first one, we study only the errors that are related to marginalization over the scattering coefficient using an otherwise accurate forward model A (x,z0). Thus, numerical model reduction errors are not present. The second example is similar, but the prior model for the scattering coefficient is somewhat off in the sense that the actual background of scattering differs from the modeled one in A (x,z0). In the third example, the numerical model reduction is included, that is, we use the model A(x,z0).

In the following we explain the common details for the numerical examples.

4.1 Computational Forward Models and Prior Model

In the numerical studies, the domain Ω ⊂ 2 is a circle with radius 25 mm. The measurement setup consists of ms = 32 sources and md = 32 detectors, located at equispaced intervals on the boundary Ω. With this setup, the number of log-amplitude and phase measurements is 2m = 2048.

The simulated measurement data is computed with the FEM approximation of the diffusion approximation model in a mesh 1 which is dense enough to consider the solution as the solution of the problem (22)-(24). Two other finite element meshes and models are constructed, the (relatively) accurate [2,A (x,z)] and the radically reduced one [3,A(x,z)] (see Table 1). The actual simulated measurement data was obtained by adding mutually independent (non-identically distributed) noise σe,j = δ|y*,j|/100 with δ = 0.5, that is, the error level was 0.5% of the respective noiseless measurement. Thus, the (instrument) noise is additive and independent, identically distributed, but the individual variances are not equal, that is, the covariance Γe is diagonal but the diagonal entries are not equal.

TABLE 1: The FEM meshes used in the simulations: Ne is the number of nodes and Nn is the number of triangular elements in the mesh.

Mesh Use Nn Ne
1 Simulation of measurements 12,853 25,326
2 Accurate forward model 11,329 22,302
3 Reduced forward model 703 1326

For the reconstruction basis in (28) for the coefficients (x,z) in the inverse problem, we divide the domain Ω into np = 1904 square pixels for both the accurate and approximate models. Thus we have P = I, x = Px = x, and the parameter vectors x,z1904. Having P : nm where m << n has an impact mainly when m is very small, and less with having m << n. An exception is poor (initial) modeling, that is, if the accurate representation x is a poor approximation for the reality.

The prior model was constructed as in Section 3.3. The parameters for the prior distribution are given in Table 2. Elementwise, this means that the values of absorption and scatter coefficients are expected to lie within the two standard deviation intervals x*± 2σx and z*± 2σz with probability of 95%, with respect to the marginal prior distributions.

TABLE 2: The parameters for the prior model: means and the standard deviations σ for the homogeneous background and inhomogeneities. The correlation length for both coefficients was set as 8 mm.

x* 10.0 · 10-3 mm-1
σbg,x 1.2 · 10-3 mm-1
σin,x 3.0 · 10-3 mm-1
z* 1000 · 10-3 mm-1
σbg,z 120 · 10-3 mm-1
σin,z 300 · 10-3 mm-1

Two draws from the prior models π(x) and π(z) are shown in Fig. 1. In the draws, the variation of the background that was part of the construction of the prior model is clearly visible.

FIG. 1: Two draws from the prior distribution: (left) absorption coefficients x() and (right) scattering coefficient z().

4.2 Estimates and Approximation Error Statistics

We compute MAP estimates only by minimizing the respective posterior potentials. The following particular estimates are computed in the three test cases:

  1. MAP-REF—Maximum a posteriori estimate for both parameters (x,z) with the conventional error model y = A (x,z) + e. This estimate is obtained by computing

    and can be considered as a reference estimate in which both distributed parameters are estimated simultaneously.

  2. MAP-CEM—Maximum a posteriori estimate for the primary unknown x using fixed z = z* and conventional error model y = A (x,z*) + e, corresponding to

  3. MAP-AEM—Maximum a posteriori estimate for x using fixed z = z* and the approximation error model, corresponding to

In the above functionals, when the reduced-order model is used, the model A (x,·) is to be substituted by A(x,·) [see the posterior potential V (x) in (21)].

In the following numerical examples, the realization z* is the mean of the prior model π(z). The estimates (30)-(32) are computed with the Gauss-Newton optimization method with an explicit line search [36].

For the construction of the approximation error statistics, we proceed as follows. The statistics were used only with MAP with the approximation error model (MAP-AEM) and were computed to correspond to the employed forward model. The means and covariances for (x,ε) in the approximation error model (15) were estimated by Monte Carlo simulation. For this, we draw the sets of samples {x(ℓ), ℓ = 1,...,nsamp} and {z(ℓ), ℓ = 1,...,nsamp} from the prior models π(x) and π(z), respectively. Using the sets of samples, the realizations of the approximation error are computed as

for the case where ε is due to using fixed z = z0 only, that is, no model reduction errors are present, and

for the cases in which both errors from model reduction and using fixed z = z0 are present. The means x* and ν*|x and the covariances Γx and Γν|x are then estimated as sample averages using the samples {x(ℓ)(ℓ), ℓ = 1...nsamp}. In the following examples, we use sample size nsamp = 20,000.

4.3 Reconstructions

We have three different parameters z: zbg,actual is the actual (unknown distributed) parameter, z* is the mean of the modeled prior distribution, and z0 is the fixed value of z used in the posterior model. Both z0 and z* can be chosen separately and we do not need to have z0z*. Technically, it is possible to optimize z0, but this is not considered here.

Three reconstruction cases are considered: case 1—marginalization over z only with z0 = z*, and furthermore, z0 = zbg,actual; case 2—as in Case 1 but with z0zbg,actual to assess the robustness toward the poor choice of z0; and case 3—both marginalization over z and numerical model reduction errors are present. The actual spatial distributions for x and z are blocky targets in a homogeneous background. The probability of the actual x and z with respect to the prior models is relatively low since they are discontinuous (see top row of any of Figs. 2-4). This is one of the ways to check the robustness of the approximation error approach against prior model design.

FIG. 2: Case 1. Rows from top to bottom: actual distributions, MAP-REF estimate (30) for (x,z), MAP-CEM estimate (31) for x using fixed z0 = z* and conventional noise model, and MAP-AEM estimate (32) using fixed z0 = z* and the approximation error model. The modeling error in MAP-CEM and MAP-AEM is caused solely by using the (incorrect) fixed value z0 = z*. Here zbg,actual = z* = z0.

FIG. 3: Case 2. Misspecification of z0. Rows as in Fig. 2. The modeling error in MAP-CEM and MAP-AEM is caused solely by using a fixed value z = z0. Here zbg,actual = z* + 1.5σz = z0 + 1.5σz.

FIG. 4: Case 3. Rows as in Fig. 2. The modeling error in MAP-CEM and MAP-AEM is caused by using a fixed value z = z0 and the reduced-order model A(x,z0) for the forward problem. Here zbg,actual = z* = z0.

Case 1: Modeling errors caused by using a fixed value for z.—The results for case 1 are shown in Fig. 2. The actual coefficients (x,z) are shown in the top row. The background values of the target distributions coincide with the prior means. In particular, zbg,actual = z0.

The reference estimates MAP-REF for (x,z), computed by minimization of (30), are shown in the second row. The estimates are computed using the accurate forward model A (x,z), meaning that there are no discretization-related errors present. The MAP estimate with the conventional error model (MAP-CEM) estimate (31) for x, using the fixed value z = z* for the scattering and the conventional noise model y = A (x,z*) + e, is shown in the third row. As can be seen, the use of incorrect value z* (which is here equivalent to the background scatter) has effectively destroyed the reconstruction of the absorption coefficient x, although the actual z differs from the modeled z0 only in three small subdomains.

The fourth row shows the MAP-AEM estimate (32) for x using the same fixed z = z* and the approximation error model y = A (x,z*) + ε + e. The estimate for the absorption is similar to the MAP-REF estimate, but the circular targets are slightly more blurred and the values of the inhomogeneous targets are slightly more off the actual values than with the MAP-REF. This is unavoidable in most cases, since the inclusion of the statistics of the approximation errors increases the variances of their likelihood, which in turn drives the estimates toward the mean of the prior model.

Case 2: Tolerance against poor choice of z0.— The results for case 2 are shown in Fig. 3. The estimates are computed and arranged as in Fig. 2, the only difference being that in Fig. 3 the actual background value of the scattering target in zbg,actual = 1.5z0, the discrepancy of which corresponds to 1.5σz with respect to the prior model.

The decrease of the quality of the MAP-REF is due to using a poor model for the prior (background) mean only. This is still a reasonable estimate showing all five objects of the actual spatial distributions. The MAP-CEM estimate is completely useless.

The reconstruction quality in the MAP-AEM estimate is similar to that in Fig. 2. This means that a 1.5σz underestimation of the scattering coefficient in the model A(x,z0) is tolerated well. We also tested a case where the background of the scattering target was -1.5σz and ±2σz away from the mean z0. The MAP-AEM estimate of x remained similar to the estimate shown in bottom row in Fig. 3, except in the case zbg,actual = z0 - 2σz. In this case, the quality of the MAP-AEM estimate decreased significantly, producing a nearly useless estimate. Such misspecification of the background is not expected in practice, since the estimation of the best spatially homogeneous estimates for x and z can be done readily if the approximate background values are not known (see [8]). Based on these numerical tests, the approximate premarginalization with respect to the unknown scattering parameter z is reasonably tolerant against a discrepancy between the prior model and the fixed z = z0.

Case 3: Combined approximation error caused by fixed z = z* and model reduction.— The results for case 3 are shown in Fig. 4. The estimates are computed from the same data that was used in Fig. 2. The only difference between case 1 and case 3 is that in case 3 the estimates are computed using the reduced forward model A(x,z0), with the number of nodes in the FEM approximation being Nn = 1326, instead of Nn = 22,302 in the accurate model A(x,z0).

The reference estimate MAP-REF for both unknowns (x,z) was destroyed by the unaccounted for approximation error caused by use of the reduced FEM model. As with case 2, the MAP-CEM estimate is again useless. On the other hand, the MAP-AEM estimate with the proposed approach remained similar to case 1, showing that simultaneous recovery from the use of an incorrect fixed value of z and model-reduction-related errors is feasible with the proposed approach.

We did not consider any auxiliary uncertainties ξ in the above example. In DOT, the principal candidate for ξ is poorly known exterior geometry, but there are also other topics. In clinical applications of optical tomography in particular, the actual optode locations might be slightly off the modeled locations. Furthermore, the channel amplifications can only be calibrated up to a constant with, which constant is difficult to estimate.

5. Conclusions

In this paper we applied the recently proposed approximation error approach for approximate premarginalization of uninteresting distributed parameters. The approximation error approach is based on the Bayesian framework for inverse problems. The approximation error approach has earlier been shown to be able to deal with diverse types of approximation and modeling errors and uncertainties, such as those related to pure model reduction, unknown boundary data, mismodeled geometry, and use of approximative physical models.

We considered the special case of approximate premarginalization over the inhomogeneous scattering coefficient in diffuse optical tomography. This is an example of a problem in which there are two or more unknown distributed parameters, of which only one is of interest. The results for this example problem suggest that the approximation error approach is able to compensate for using an incorrect fixed value for the uninteresting distributed parameter. In this particular example, the premarginalization was carried over the scattering coefficient, as well as the errors related to simultaneous reduction of the computational forward model. It was also shown that at least in the studied cases, the approach is tolerant to a reasonable misspecification of the fixed scattering coefficient.


This work was supported by the Academy of Finland, projects 119270, 122499, 218183, 140731 and 213476, the Finnish Centre of Excellence in Inverse Problems Research 2006-2011, and by EPSRC project EP/E034950/1.


1. Daily, W. D., Ramirez, A. L., LaBrecque, D. J., and Nitao, J., Electrical resistivity tomography of vadose water movement, Water Resour. Res., 28:1429-1442, 1992.

2. Daily, W. D. Ramirez, A. L., Electrical resistivity tomography, The Leading Edge, 23(1):438-442, 2004.

3. Arridge, S. R., Optical tomography in medical imaging, Inverse Probl., 15:R41-R93, 1999.

4. Gibson, A. P., Hebden, J. C., and Arridge, S. R., Recent advances in diffuse optical imaging, Phys. Med. Biol., 50:R1-R43, 2005.

5. Kaipio, J. and Somersalo, E., Statistical and Computational Inverse Problems, Springer, New York, 2005.

6. Kaipio, J. and Somersalo, E., Statistical inverse problems: Discretization, model reduction and inverse crimes, J. Comput. Appl. Math., 198:493-504, 2007.

7. Arridge, S., Kaipio, J., Kolehmainen, V., Schweiger, M., Somersalo, E., Tarvainen, T., and Vauhkonen, M., Approximation errors and model reduction with an application in optical diffusion tomography, Inverse Probl., 22:175-195, 2006.

8. Kolehmainen, V., Schweoger, M., Nissilä, I., Tarvainen, T., Arridge, S., and Kaipio, J., Approximation errors and model reduction in three-dimensional optical tomography, J. Opt. Soc. Am. A, 26:2257-2268, 2009.

9. Heino, J. and Somersalo, E., A modelling error approach for the estimation of optical absorption in the presence of anisotropies, Phys. Med. Biol., 49:4785-4798, 2004.

10. Heino, J., Somersalo, E., and Kaipio, J., Compensation for geometric mismodelling by anisotropies in optical tomography, Opt. Express, 13(1):296-308, 2005.

11. Calvetti, D., Kaipio, J. P., and Somersalo, E., Aristotelian prior boundary conditions, Int. J. Math., 1:63-81, 2006.

12. Lehikoinen, A., Finsterle, S., Voutilainen, A., Heikkinen, L., Vauhkonen, M., and Kaipio, J., Approximation errors and truncation of computational domains with application to geophysical tomography, Inverse Probl. Imag., 1:371-389, 2007.

13. Nissinen, A., Heikkinen, L., and Kaipio, J. P., Approximation errors in electrical impedance tomography—An experimental study, Meas. Sci. Technol., 19:015501, 2008.

14. Nissinen, A., Heikkinen, L., Kolehmainen, V., and Kaipio, J. P., Compensation of errors due to discretization, domain truncation and unknown contact impedances in electrical impedance tomography, Meas. Sci. Technol., 20:105504, 2009.

15. Nissinen, A., Kolehmainen, V., and Kaipio, J. P., Compensation of modelling errors due to unknown domain boundary in electrical impedance tomography, IEEE Trans. Med. Imag., 2010, in press.

16. Huttunen, J. and Kaipio, J., Approximation errors in nostationary inverse problems, Inverse Probl. Imag., 1(1):77-93, 2007.

17. Huttunen, J. and Kaipio, J., Approximation error analysis in nonlinear state estimation with an application to state-space identification, Inverse Probl., 23:2141-2157, 2007.

18. Huttunen, J. and Kaipio, J., Model reduction in state identification problems with an application to determination of thermal parameters, Appl. Numer. Math., 59:877-890, 2009.

19. Seppänen, A., Vauhkonen, M., Vauhkonen, P., Somersalo, E., and Kaipio, J., State estimation with fluid dynamical evolution models in process tomography — An application to impedance tomography, Inverse Probl., 17:467-484, 2001.

20. Huttunen, J., Lehikoinen, A., Hämäläinen, J., and Kaipio, J., Importance filtering approach for the nonstationary approximation error method, Inverse Probl., 2010, in press.

21. Lehikoinen, A., Huttunen, J., Voutilainen, A., Finsterle, S., Kowalsky, M., and Kaipio, J. P., Dynamic inversion for hydrological process monitoring with electrical resistance tomography under model uncertainties, Water Resour. Res., 46:W04513, 2010.

22. Tarvainen, T., Kolehmainen, V., Pulkkinen, A., Vauhkonen, M., Schweiger, M., Arridge, S. R., and Kaipio, J. P., Approximation error approach for compensating for modelling errors between the radiative transfer equation and the diffusion approximation in diffuse optical tomography, Inverse Probl., 26:015005, 2010.

23. Tarantola, A., Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM, Philadelphia, 2004.

24. Calvetti, D. and Somersalo, E., An Introduction to Bayesian Scientific Computing, Ten Lectures on Subjective Computing, ISBN 978-0-387-73393-7, Springer, 2007.

25. Berger, J., Statistical Decision Theory and Bayesian Analysis, Springer, 2006.

26. Robert, C., The Bayesian Choice, Springer, 2007.

27. Chen, M.-H., Shao, Q.-M., and Ibrahim, J., Monte Carlo Methods in Bayesian Computation, Springer, 2000.

28. J. Liu, Monte Carlo Strategies in Scientific Computing, ISBN 0-387-95230-6 in Springer Series in Statistics, Springer, 2005.

29. Somersalo, E., Kaipio, J., Vauhkonen, M., Baroudi, D., and Järvenpää, S., Impedance Imaging and Markov Chain Monte Carlo Methods, In R. Barbour, M. Carvlin, and M. Fiddy (Eds.), Proc SPIE’s 42nd Annual Meeting, Computational, Experimental and Numerical Methods for Solving Ill-Posed Inverse Imaging Problems: Medical and Nonmedical Applications, pp. 175-185, San Diego, USA, June 27-Aug 1, 1997.

30. Fox, C. and Nicholls, G., Sampling conductivity images via MCMC, In K. V. Mardia, C. A. Gill, and R. G. Aykroyd (Eds.), The Art and Science of Bayesian Image Analysis, Proceedings of the Leeds Annual Statistics Research Workshop, pp. 91-100, Leeds University Press, Leeds, UK, 1-4 July 1997.

31. Kaipio, J. P., Kolehmainen, V., Somersalo, E., and Vauhkonen, M., Statistical inversion and Monte Carlo sampling methods in electrical impedance tomography, Inverse Probl., 16:1487-1522, 2000.

32. Arridge, S. and Schotland, J., Optical tomography: Forward and inverse problems, Inverse Probl., 25:123010, 2009.

33. Tromberg, B. J., Pogue, B. W., Paulsen, K. D., Yodh, A. G., Boas, D. A., and Cerussi, A. E., Assessing the future of diffuse imaging technologies for breast cancer management, Med. Phys., 35:2443-2451, 2008.

34. Schweiger, M., Arridge, S. R., Hiraoka, M., and Delpy, D. T., The finite element model for the propagation of light in scattering media: Boundary and source conditions, Med. Phys., 22(11):1779-1792, 1995.

35. Heino, J. and Somersalo, E., Estimation of optical absorption in anisotropic background, Inverse Probl., 18:559-573, 2002.

36. Nocedal, J. and Wright, S. J., Numerical Optimization, 2nd ed., Springer, New York, 2006.

1 The subscripts, such as e|x in πe|x, are used to determine the actual probability density function. If the arguments, however, coincide with the density, we drop the subscripts. For example, we write πx(x) = π(x) and πe|x(e|x) = π(e|x), but retain the subscript in πe|x(y - A(x)|x). Furthermore, we use the terms density and distribution interchangeably.

2 If there are no additive errors, we write e = 0 and consider the other types of errors to be included in ξ.

3 With autocovariances, we may notate Γxx = Γx below.

4 In the sequel, “in” refers to inhomogeneous, “bg” to background.