Factor de Impacto:
4.911
Factor de Impacto de 5 años:
3.179
SJR:
1.008
SNIP:
0.983
CiteScore™:
5.2
ISSN Imprimir: 21525080
Acceso abierto
Volumes:

International Journal for Uncertainty Quantification
DOI: 10.1615/Int.J.UncertaintyQuantification.2012003889
pages 357370 AN ENSEMBLE KALMAN FILTER USING THE CONJUGATE GRADIENT SAMPLERAbstract The ensemble Kalman filter (EnKF) is a technique for dynamic state estimation. EnKF approximates the standard extended Kalman filter (EKF) by creating an ensemble of model states whose mean and empirical covariance are then used within the EKF formulas. The technique has a number of advantages for largescale, nonlinear problems. First, largescale covariance matrices required within EKF are replaced by lowrank and lowstorage approximations, making implementation of EnKF more efficient. Moreover, for a nonlinear state space model, implementation of EKF requires the associated tangent linear and adjoint codes, while implementation of EnKF does not. However, for EnKF to be effective, the choice of the ensemble members is extremely important. In this paper, we show how to use the conjugate gradient (CG) method, and the recently introduced CG sampler, to create the ensemble members at each filtering step. This requires the use of a variational formulation of EKF. The effectiveness of the method is demonstrated on both a largescale linear, and a smallscale, nonlinear, chaotic problem. In our examples, the CGEnKF performs better than the standard EnKF, especially when the ensemble size is small. KEYWORDS: ensemble Kalman filter, data assimilation, conjugate gradient iteration, conjugate gradient sampler 1. INTRODUCTIONThe Kalman filter (KF), first introduced in [1], is the extension of Bayesian minimum variance estimation to problems in which the unknown to be estimated, which we call the state, varies according to some (approximately) known model, and is indirectly and sequentially observed in time. KF assumes linear state and observation models, and its extension to nonlinear cases is known as the extended Kalman filter (EKF) [2]. The outputs from KF and EKF at a particular time step are estimates of the state and its covariance matrix. At the next time step, as new data come in, these estimates are then updated using the Kalman, or some equivalent, formulas. Both KF and EKF have been successfully used in a number of settings, e.g., autonomous and assisted navigation. However, for problems in which the dimension of the state space is prohibitively large, such as arising in weather and ocean forecasting, storing and operating by the state covariance matrix, which is dense, is computationally prohibitive. To overcome this issue, a number of approximations have been developed that replace the statespace covariance matrices appearing within the filter by lowrank or lowstorage approximations. These include approaches in which the state space is projected onto a smaller subspace [3–6]. However, since the chosen subspace is typically fixed in time, the dynamics of the system, which changes in time, is often not correctly captured [7]. A related approach makes use of iterative methods both for state estimation and covariance approximation. For example, in [8–10], the limited memory BroydenFletcherGoldfarbShanno (BFGS) (LBFGS) method is used for state estimation, as well as to build lowstorage (full rank) covariance approximations, in both standard and variational [Bayesian maximum a posteriori (MAP)] formulations of KF and EKF. In [11], the same approach is taken, with the conjugate gradient iteration used instead of LBFGS, yielding lowrank covariance approximations. Similar ideas are explored in [12]; however, the covariance approximations are derived in a more complicated fashion. These approaches have the benefit that the associated covariance approximations change with each step in the filter based on local information. However, for nonlinear problems, the methods require tangent linear and adjoint codes for computing the Jacobian of the model and its adjoint at an arbitrary state. Such software is model dependent and can be quite tedious to write for complex models. An approach that does not require such linearization software is the socalled ensemble Kalman filter (EnKF). In EnKF, a random sample of states, called an ensemble, is computed at each time step, and the state and covariance estimates are taken to be the sample mean and covariance of the ensemble. EnKF was first proposed in [13], but several variants now exist [14]. Not surprisingly, the technique has its own set of problems, e.g., sampling errors due to random perturbations of model state and observations, and from ensemble inbreeding; see, e.g., [15–17]. Moreover, ensemble filters yield lowrank approximate covariance matrices, with column space spanned by the meansubtracted ensemble members. To make the approximations fullrank, an additional matrix must be added, which is known in the data assimilation literature as “covariance inflation” [18, 19]. In a recent paper [20], an ensemble version of the variational approach set forth in [9] is presented. As in [9], LBFGS is used to minimize a quadratic function—the negative log of the Bayesian posterior density function—whose minimizer and inverse Hessian are the Kalman filter state estimate and covariance, respectively. A new ensemble is generated directly from the covariance approximation without having to store dense matrices and perform matrix decompositions. The resulting forecast ensemble is then used to build an approximate forecast covariance without the use of tangent linear and adjoint codes. In contrast with standard EnKF approaches, random perturbations of the model state and observations are not used in this method, and hence some of the problems related to existing ensemble methods do not arise. In this paper, we follow the general approach of [20], but instead use the conjugate gradient (CG) iteration for quadratic minimization and the CG sampler of [21] for ensemble calculation. The resulting method has several advantages over the LBFGS version. Perhaps most importantly, for the examples we consider, it results in a more accurate and faster converging filter. Also, it is more intuitive and much simpler to implement, requiring the addition of only a single inexpensive line of code within CG. And finally, a rigorous supporting theory for the accuracy of the CG ensembles was developed in the recent work of [21]. We call the proposed method the conjugate gradient ensemble Kalman filter (CGEnKF). The paper is organized as follows. In Section 2, we recall the basics of Kalman filtering and ensemble methods. We introduce the CGEnKF algorithm and the relevant CG theory in Section 3 and demonstrate the method with numerical examples in Section 4. We end with conclusions in Section 5. 2. KALMAN FILTERING METHODSIn this paper, we consider the following coupled system of discrete, nonlinear, stochastic difference equations
In the first equation, x_{k} denotes the n × 1 state vector of the system at time k, ℳ is the (possibly) nonlinear evolution operator, and ε ^{p}_{k} is a n × 1 random vector representing the model error and is assumed to characterize errors in the model and in the corresponding numerical approximations. In the second equation, y_{k} denotes the m × 1 observed data vector, K_{k} is the m × n linear observation operator, and ε ^{o}_{k} is an m × 1 random vector representing the observation error. The error terms are assumed to be independent and normally distributed, with zero mean and with covariance matrices C_{ε pk} and C_{ε ok} , respectively. In this paper, we do not consider the (often cumbersome) estimation of these matrices, and assume that they are given. The task is to estimate the state x_{k} and its error covariance C_{k} at time point k given y_{k}, K_{k}, C_{ε ok} , the function ℳ(x), ε ^{p}_{k} and estimates x ^{est}_{k−1} and C ^{est}_{k−1} of the state and covariance at time point k − 1. The extended Kalman filter (EKF) is the standard method for solving such problems [2]. The formulation of EKF requires that we linearize the nonlinear function ℳ at x_{k} for each k. In particular, we define
where ∂ / ∂x denotes the Jacobian computation with respect to x. EKF then has the following form. Algorithm 1 (EKF). Select initial guess x ^{est}_{0} and covariance C ^{est}_{0} , and set k = 1.
Note that in the linear case ℳ(x_{k−1}) = M_{k} x_{k−1}, and EKF reduces to the classical linear Kalman filter [1]. For largescale, nonlinear problems, such as arising in weather and sea forecasting, the linearization of ℳ required within EKF can be problematic. For example, numerical approximations may need to be used, yielding inaccuracies, or the linearization might be either too computationally burdensome or complicated. Storage of dense covariance matrices of size n × n, where n is the size of the state space, can also be problematic. An approximation of EKF that has reduced storage requirements and does not involve linearization of ℳ is the ensemble Kalman filter. In EnKF, a representative ensemble is sampled at each step and is integrated forward by the model ℳ. The resulting ensemble is then used to create a lowrank (and hence lowstorage) approximation of the model covariance C ^{p}_{k}. Algorithm 2 (EnKF). Sample initial ensemble x ^{est}_{0,i} for i = 1, …, N, from N (x ^{est}_{0} , C ^{est}_{0} ), and set k = 1.
EnKF computations can be carried out efficiently so that the covariances are kept in the lowrank “ensemble form”, without explicitly computing the covariance C ^{p}_{k} or the Kalman gain matrix G_{k} [14]. However, these lowrank approximations typically require the use of a form of regularization known as “covariance inflation” [18, 19]. Another issue with EnKF is that unless the ensemble size N is sufficiently large, the estimator defined by the ensemble mean in step ii.3 can be inaccurate and the method may perform poorly. Finally, EnKF also suffers from inaccuracies brought on by the random perturbation of model states and observations; see steps i.1 and ii.2 of the algorithm. 3. THE CG ENSEMBLE KALMAN FILTERThe Kalman filter estimate x ^{est}_{k} and its covariance C ^{est}_{k}, described in step ii of Algorithm 1, can be written in Bayesian MAP form, as the minimizer and inverse Hessian of the quadratic function
resulting in an equivalent variational (i.e., optimizationbased) formulation of the Kalman filter. Note that (4) is the negative log of the posterior density function with likelihood defined by the observation model (2) and defined prior by step i of Algorithm 1, i.e., by N(x ^{p}_{k} , C ^{p}_{k} ). Also, using (4) requires that multiplication by C^{− 1}_{ε ok} is efficient. An alternative approach to EnKF is to reformulate step ii in Algorithm 2 using (4), so that instead an iterative method is used to estimate both the minimizer and inverse Hessian, ∇^{2} ℓ(xy_{k})^{− 1}, of (4). The N new ensemble members are then sampled from N(x ^{est}_{k}, C ^{est}_{k}), where x ^{est}_{k} and C ^{est}_{k} are the estimates of the minimizer and inverse Hessian computed by the method. To efficiently implement the variational approach, one must overcome two computational bottlenecks. First, one needs to make sure that the evaluation of the quadratic expression is efficient and that no large dense matrices are stored in memory. Especially, an efficient way to multiply by (C ^{p}_{k} )^{− 1} is needed. Second, an efficient way to produce samples from N(x ^{est}_{k} , C ^{est}_{k} ) is required. Usually, sampling from a Gaussian is done via the Cholesky decomposition of the dense covariance matrix, which is prohibitively expensive in highdimensional cases, where one cannot even store dense covariance matrices. In [20], lowstorage covariance approximations are used to overcome the above issues, computed using the limited memory BFGS (LBFGS) [22] optimization algorithm. As is wellknown, LBFGS yields an approximate minimizer and uses iteration history to construct an approximate inverse Hessian of (4). As shown in [20], random samples can also be produced efficiently from the LBFGS covariance (inverse Hessian) approximations. CG iteration history can also be used to create a lowrank approximation of the inverse Hessian. Moreover, recent results in [21] show how samples from the Gaussian with this approximate covariance can be easily generated from within CG with almost no additional computational cost, and in such a way that the only additional requirement is to store the N samples (ensemble members), each of size n × 1. Computationally, the CG implementation is slightly more efficient than the LBFGS implementation. When LBFGS is used, some additional storage besides N samples is required for the LBFGS covariance approximation, and the generation of the ensemble members requires an additional computation after the LBFGS iteration has stopped; see [20] for details. Finally, the theory for the accuracy of the CG samples is welldeveloped (Section 3.1.1), whereas, to our knowledge, such an analysis does not yet exist for LBFGS inverseHessian approximation. The CG ensemble Kalman filter has the following form. Algorithm 3 (CGEnKF). Sample initial ensemble x ^{est}_{0,i} for i = 1, …, N, from N (x ^{est}_{0} , C ^{est}_{0} ), and set k = 1.
Applying CG in step ii.1 requires that efficient multiplication by (C ^{p}_{k} )^{− 1} is possible. For this, define Then C ^{p}_{k} = X _{k} X ^{T}_{k} + C_{ε pk} , and the matrix inversion lemma can be used as in [20]:
Note that X + X ^{T}_{k} C ^{− 1}_{ε pk} X _{k} is an N × N matrix. Thus, assuming that N is not too large, and that multiplication by C ^{− 1}_{ε pk} is efficient, multiplication by (5) will also be efficient. In our examples, we assume that C_{ε pk} is a diagonal matrix, which makes the inversion and multiplication with a vector easy. We observe that in contrast to EnKF and many of its variants, the model error term is included in the cost function ℓ(xy_{k}) directly, and the optimization is carried out in the full state space. Since a fullrank model error covariance is used, the technique of covariance inflation [18, 19] is not needed. Naturally, the trouble of tuning the model error covariance still remains, but this quantity has a clear statistical interpretation, unlike the rather ad hoc covariance inflation parameters. Many existing ensemble methods cannot incorporate model error directly, and this is one of the strengths of the variational approach. We will now introduce the CG sampler, used in step ii.1, for computing both the estimator and ensemble members. 3.1 The CG SamplerFirst, we rewrite ℓ(xy_{k}) defined in step (4) as
where A = K ^{T}_{k} (C_{ε ok} )^{− 1} K_{k} + (C ^{p}_{k} )^{− 1} and b = K ^{T}_{k} (C_{ε ok} )^{− 1} y_{k} + (C ^{p}_{k} )^{− 1} x ^{p}_{k} . Recent work of [21] shows that while finding the minimizer of the quadratic ℓ, the CG algorithm can also be used to inexpensively generate approximate samples w_{i} from N(0, A^{− 1}). The new ensemble (in step ii.1 of Algorithm 3) is then defined
where x ^{est}_{k} is the approximate minimizer of ℓ(xy_{k}) computed by CG. We call this modified CG algorithm the CG sampler, as in [21]. It is given as follows. Algorithm 4 (CG sampler). Given A, b and x_{0}, let r_{0} = b − Ax_{0}, p_{0} = r_{0} d_{0} = p ^{T}_{0} Ap_{0}, j = 1 and w_{i,0} for i = 1, …, N. Specify some stopping tolerance ϵ and iterate:
The cost of implementing CG is dominated by the matrixvector multiply at a cost of about 2n^{2} flops in each CG iteration [23]. Thus, as long as the number of CG iterations j is small relative to n, then CGEnKF will be cheaper to implement than EnKF. Remark 5. At iteration k of CGEnKF, after the CG sampler iterations have stopped, we define x ^{est}_{k} to be the most recent CG iterate x_{j}, and w_{i} to be the most recent CG sample w_{i,j} as in step vii. The new ensemble is then given by Eq. (7). 3.1.1 Analysis of the CG Sampler ApproximationsIn this section, we provide the relevant theory from [21] regarding how well the mean and covariance matrix of the CG samples {w_{i}} approximate the mean and covariance of N(0, A^{− 1}), the target Gaussian of interest. In exact arithmetic, CG is guaranteed to find a solution to the n × n linear system Ax = b [or, equivalently, to find the minimizer of the quadratic in Eq. (6)] after a finite number of iterations, and the CG samples will be distributed as N(0, A^{− 1}) when the eigenvalues of A are distinct and n CG iterations are computed. The reason for this efficiency is that the CG search directions {p_{l}} are Aconjugate, p^{T}_{l} Ap_{m} = 0 (see, e.g., [24]). In finite precision, however, the CG search directions lose conjugacy at some iteration less than n. Nevertheless, CG still finds a solution to Ax = b as long as “local conjugacy” of the search directions is maintained [25]. In fact, when the eigenvalues of A are clustered into j groups, CG finds the approximate solution after j iterations in a jdimensional Krylov space, The loss of conjugacy of the search directions is detrimental to the CG sampler (just as it is for iterative eigensolvers), and, without corrective measures, prohibits sampling from the full Gaussian of interest, N(0, A^{− 1}). Nonetheless, the resulting samples w_{i} have a realized covariance which is the best jrank approximation to A^{− 1} (with respect to the 2norm) in the same jdimensional Krylov space searched by the CG linear solver. In order to make the previous discussion more explicit, we establish some notation. If we let P_{j} be the n × j matrix with {p_{l}}^{j−1}_{l=0} as columns, and P_{B} be the n × (n − j) matrix with {p_{l}}^{n−1}_{l=j} as columns, then, by conjugacy, is an invertible diagonal matrix with entries [D_{n}]_{ll} = p^{T}_{l} Ap_{l}. Thus
Now, a CG sample can be written as w_{i} = w_{i,j} = P_{j}D^{−1/2}_{j} z, where z ∼ N(0, A_{j}) (since we initialize the sampler with w_{i,j=0} = 0). Thus, when the CG sampler terminates after j < n iterations,
Since the covariance matrix P_{j} D ^{−1}_{j} P ^{T}_{j} is singular, the distribution of w_{i} is called an intrinsic Gaussian in [26]. In exact arithmetic, Eqs. (8) and (9) show that at iteration j = n, the CG sampler produces the sample as long as A has n distinct eigenvalues. When the eigenvalues are not distinct, then CG terminates at iteration j < n [25]. In finite precision, the CG search directions {p_{j}} lose conjugacy at some iteration j < n. The rest of this section is devoted to answering the following question: How good is the distribution approximation N(0, P_{j} D ^{−1}_{j} P ^{T}_{j} ) to N(0, A^{− 1}) after conjugacy in the CG search directions is lost? It is well known that at the jth iteration, CG can be used to inexpensively estimate j of the eigenvalues of A, the extreme ones and the wellseparated ones [25, 27–29]. The eigenvalue estimates are known as Ritz values, and we will denote them by θ ^{j}_{1} < … < θ ^{j}_{j}. The corresponding eigenvectors can also be estimated by CG, although at a nonnegligible cost, using Ritz vectors. In exact arithmetic, by the time that CG converges to a solution of Ax = b with residual r_{j} = 0, the Ritz values have already converged to the j extreme and wellseparated eigenvalues of A, and the Ritz vectors have converged to an invariant subspace of A spanned by the corresponding eigenspaces [25]. It can be shown that [21, Theorem 3.1] the nonzero eigenvalues of Var(w_{i}b) = P_{j} D ^{−1}_{j} P ^{T}_{j} are the reciprocals of the Ritz values 1/θ ^{j}_{i} (called the harmonic Ritz values of A^{− 1} on 𝒦^{j}(A, r_{0}) [30–32]). The eigenvectors of Var(w_{i}b) are the Ritz vectors which estimate the corresponding j eigenvectors of A. Thus, when the CG sampler converges with residual r_{j} = 0, then [21, Remark 4]
for any v ∈ 𝒦^{j}(A, r_{0}). This shows that Var(w_{i}b) is the best jrank approximation to A^{− 1} in the eigenspaces corresponding to the extreme and wellseparated eigenvalues of A. In other words, the CG sampler has successfully sampled from these eigenspaces. Moreover, from Weyl′s Theorem [27] and the triangle inequality, it can be shown that ‖A^{− 1} − Var(w_{i}b)‖_{2} is at least as large as the largest eigenvalue of A^{− 1} not being estimated, and it can get as large as this eigenvalue plus the error in the Ritz estimates (see [21] for more detail). Like the difficulty faced by iterative eigenproblem solvers, the accuracy of the jrank approximation Var(w_{i}b) to A^{− 1} depends on the distribution of the eigenvalues of A. Loss of conjugacy of the CG search directions occurs at the same iteration j when at least one of the Ritz pairs converge [25], but this can happen before the CG sampler terminates (due to a small residual). As described by Eq. (10), at the iteration when loss of conjugacy occurs, Var(w_{i}b) is the best approximation to A^{− 1} in the Krylov subspace 𝒦^{j}(A, r_{0}) which contains the converged Ritz vector(s). Numerical simulations presented in [21] suggest that after loss of conjugacy, continuing to run the CG sampler until the residual is small does not have a deleterious effect on the samples. On the contrary, in the examples considered in [21], the CG sampler continued to sample from new eigenspaces, providing samples with realized covariances which better approximated A^{− 1}. 4. NUMERICAL EXPERIMENTSIn this section, we perform tests and comparisons with CGEnKF.We consider two synthetic examples: the Lorenz 95 system, which is a firstorder nonlinear, chaotic ordinary differential equation system that shares some characteristics with weather models; and a twodimensional heat equation with a forcing term that can be made largescale. 4.1 Lorenz 95We begin with the Lorenz 95 test case, introduced in [33], and analyzed in [34]. The model shares many characteristics with realistic atmospheric models and it is often used as a loworder test case for weather forecasting schemes. We use a 40dimensional version of the model given as follows:
The state variables are periodic: x_{− 1} = x_{39}, x_{0} = x_{40}, and x_{41} = x_{1}. Out of the 40 model states, measurements are obtained from 24 states via the observation operator K_{x} = K, where
with i = 1, 2, 3 and j = 0, 1, …, 7. Thus, we observe the last three states in every set of five. To generate data, we add Gaussian noise to the model solution with zero mean and covariance (0.15 σ_{clim})^{2}I, where σ_{clim} = 3.641 (standard deviation used in climatological simulations). In the filtering methods, we use C_{ε pk} = (0.05 σ_{clim})^{2}I as the model error covariance and C_{ε ok} = (0.15 σ_{clim})^{2}I as the observation error covariance. As initial guesses in the filtering, we use x^{est}_{0} = 1 and C^{est}_{0} = I. For more details about the example, see [8, 9]. We run experiments with varying ensemble size N, and we note that in the linear case, as N → ∞, EnKF converges to EKF (which in the linear case is just KF). When the model ℳ is nonlinear, however, the approximations of the model covariance C^{p}_{k} computed in step i.2 of Algorithms 1 and 2 (EKF and EnKF, respectively) will be different; specifically, in EKF C^{p}_{k} is obtained from a linearization of ℳ about x^{est}_{k−1}, whereas in EnKF C^{p}_{k} is computed after the ensemble members have been pushed forward by the model. CGEnKF is the ensemble version of the CG variational Kalman filter (CGVKF), which was introduced in [11]. In implementation, the two algorithms are very similar. Specifically, in CGVKF the covariance approximation P_{j} D ^{−1}_{j} P_{j} defined in (8) is used directly, whereas in CGEnKF, the ensembles are sampled from N(x^{est}_{k} , P_{j} D ^{−1}_{j} P_{j} ). Moreover, as for EnKF and EKF, in the linear case CGEnKF converges to the CGVKF as N → ∞. For completeness, we present CGVKF now. Algorithm 6 (CGVKF). Select initial guess x^{est}_{0} and lowrank covariance approximation B ^{#}_{0} = X_{0} X ^{T}_{0} of C^{est}_{0} , and set k = 1.
For comparisons, we plot the relative error
where, at iteration k, x ^{est}_{k} is the filter estimate and x ^{true}_{k} is the truth used in data generation. For the ensemble methods, we test ensemble sizes N = 10 and N = 20. Since ensemble filters are stochastic methods, we show relative errors averaged over 20 repetitions. In CGbased filters, the residual norm stopping tolerance was set to 10^{−6}, and the maximum number of iterations was set to 50. From the results in the top plot in Fig. 1, we see that CGEnKF outperforms EnKF for small ensemble sizes. When the ensemble size gets larger, CGEnKF, CGVKF, and EnKF performances approach each other, as expected. Finally, CGVKF and EKF perform equally well, although EKF reduces the error faster in early iterations. Nonmonotonicity in the reduction of the error plots is a product of the chaotic nature of the Lorenz 95 model. FIG. 1: In the top plot is a comparison of relative errors. For EnKF and CGEnKF, the upper and lower curves correspond to ensemble sizes of N = 10 and 20, respectively. In the bottom plot is a ‘forecast skill’ comparison. For EnKF and CGEnKF, the upper, middle, and lower curves correspond to ensemble sizes N = 15, 20, and 40, respectively. In the bottom plot in Fig. 1, we also compare the forecast skills given by different methods, using ensemble sizes N = (15, 20, 40). For this comparison, we compute the following statistics for forecasts launched at every fourth filter step, starting from the 64th step (when all filters have converged). Take j ∈ ℐ := {4i  i = 16, 17, …, 100} and define
where ℳ_{n} denotes a forward integration of the model by n time steps. Thus, this vector gives a measure of forecast accuracy given by the respective filter estimate up to 80 time steps, or 10 days out. We average the forecast accuracy over the 85 forecasts, and define the forecast skill vector as
Again, CGEnKF outperforms EnKF, especially when N is small. For instance, CGEnKF with N = 20 performs as well as EnKF with N = 40. CGVKF and EKF perform in a similar way. One might ask which of EKF, CGEnKF, and CGVKF is the most desirable in a given situation. In this example, the numerical results suggest that EKF performs best, followed by CGVKF. However, both EKF and CGVKF require the Jacobian matrix M_{k} defined in (3) and its transpose M ^{T}_{k} , which cannot be efficiently computed in some largescale nonlinear examples. In such cases, CGEnKF is the best approach. In terms of computational cost, for sufficiently largescale problems (such as the one considered next), the storage and inversion of n × ncovariance matrices required in EKF is prohibitive: n^{2} elements must be stored and inversion requires 𝒪(n^{2}) operations. For CGEnKF and CGVKF, nj elements are stored—where j is the number of ensemble members for CGEnKF and the number of stored CG search directions for CGVKF—and the matrix inverse computed in (5) requires 𝒪(j^{2}) operations, which is a significant savings if j ≪ n. Finally, we compare CGEnKF with the limited memory BFGS (LBFGS)based ensemble Kalman filter of [20]. This method has the same form as Algorithm 3, except that in step ii, LBFGS is used in place of CG to compute both the approximate minimizer x ^{est}_{k} of ℓ(xy_{k}) defined in (4), as well as of the covariance C ^{est}_{k} = ∇^{2} ℓ(xy_{k})^{− 1}. Sampling from N(0, B_{k}), where B_{k} is the LBFGS approximation of C ^{est}_{k} , requires some ingenuity, and the interested reader should see [20] for details. For completeness, we present the LBFGS ensemble filter now. Algorithm 7 (LBFGSEnKF). Sample initial ensemble x ^{est}_{0,i} for i = 1, …, N from N(x ^{est}_{0} , C ^{est}_{k} ), and set k = 1.
For a fair comparison between the CG and LBFGS ensemble filters, the stopping criteria for LBFGS are the same as for CG. In LBFGS, there are two tuning parameters: the initial inverse Hessian approximation and the number of BFGS vectors that are stored in the algorithm (see [22]). We use a heuristic method for choosing the initial inverse Hessian, as in [20, 22], and to avoid an unfair comparison we store all vectors in the LBFGS iterations. In Fig. 2, we compare the relative errors for ensemble sizes N = (10, 20), averaged over 20 repetitions. In these cases, the CG implementation performs better than the LBFGS implementation. When the ensemble size gets larger, the methods perform equally well. FIG. 2: Comparison of variational ensemble Kalman filters implemented by CG and LBFGS with ensemble sizes N = (10, 20). For LBFGSEnKF and CGEnKF, the upper and lower curves correspond to ensemble sizes of N = 10 and 20, respectively. 4.2 Heat EquationThe purpose of this example is to demonstrate CGEnKF behavior when the dimension is large. The example is linear, so we can directly compare to KF. However, as the dimension of the problem is increased, KF cannot be run due to memory issues. Note that while the example does illustrate computational aspects related to the methods, this system is wellbehaved and we cannot conclude much about how the methods work in a highdimensional chaotic case such as numerical weather prediction. The model describes heat propagation in a twodimensional grid and it is written as a partial differential equation:
where x is the temperature at coordinates u and v over the domain Ω = {(u, v) u, v ∈ [0, 1]}. The last term in the equation is an external heat source, whose magnitude can be controlled with the parameter α ≥ 0. We discretize the model using a uniform S × S grid. This leads to a linear forward model x_{k+1} = Mx_{k} + f, where M = I −ΔtL, where Δt is the time step, L is the discrete negative Laplacian, and f is an external forcing; see [8, 9] for more details. The dimension of the problem can be controlled by changing S. The observation operator K is defined as in [8, 9], with the measured temperature a weighted average of the temperatures at neighboring points at S^{2}/64 evenly spaced locations. Data are generated by adding normally distributed random noise to the model state and the corresponding response:
In data generation, we use α = 0.75 and choose σ_{ev} and σ_{obs} so that the signal to noise ratios at the initial condition, defined by ‖x_{0}‖^{2} / S^{2} σ^{2}_{ev} and ‖Kx_{0}‖^{2} / m^{2} σ^{2}_{obs} obs, are both 50. The initial condition for data generation is
For the filtering we use a biased model, where the forcing term is dropped by setting α = 0. The error covariances used for model and observations are σ^{2}_{ev} I and σ^{2}_{obs} I, respectively. We start all filters from initial guess x_{0} = 0. For ensemble filters, all members are initialized to the same value and for KF we set initial covariance estimate to C ^{est}_{0} = 0. As our first test, we take S = 2^{j} and choose j = 5, which is the largest integer so that KF can still be computed on a standard desktop computer. Thus, the dimension of the first test was d = S^{2} = 1024. Then, we compared CGEnKF and CGVKF in a case where the dimension is much higher: (j = 7, d = S^{2} = 16, 384). The stopping tolerance for CG was once again set to 10^{−6} and the number of maximum iterations was set to 20. Also, as above, the CGEnKF results are averages over 20 repetitions. EnKF results are not included in these experiments because it performs poorly with such small ensemble sizes. In the top plot in Fig. 3, we compare KF, CGVKF and CGEnKF with ensemble sizes N = (10, 20, 50), noting again that as N increases, CGEnKF approaches CGVKF. In the higher dimensional case, as noted above, KF cannot be used anymore due to memory issues. FIG. 3: Performance comparison of KF, CGVKF, and CGEnKF with ensemble sizes N = (10, 20, 50) in the case where d = 1024 (top) and d = 16, 384 (bottom). For EnKF and CGEnKF, the upper, middle, and lower curves correspond to ensemble sizes N = 10, 20, and 50, respectively. 5. CONCLUSIONSThe ensemble Kalman filter is a statespace estimation technique approximating the standard extended Kalman filter. In EnKF, an ensemble of approximate states is created at each time point and each member is propagated forward by the statespace model ℳ in (1). The covariance of the resulting ensemble is then used within the EKF formulas to yield an efficient approximate filter. In this paper, we show how the CG sampler of [21] can be applied to a variational formulation of EKF for the creation of a new ensemble at each time point. The use of CG yields a point estimate of the state, and the computed ensemble members are optimal within a certain Krylov subspace. Implementation of the CG sampler requires the addition of only a single inexpensive line of code within CG. We call the resulting algorithm CGEnKF, and we present an analysis of the accuracy of the CG samples. We apply CGEnKF to two examples, one of which is largescale and linear, and the other smallscale, nonlinear, and chaotic. In both cases, it outperforms standard EnKF, and as the ensemble size increases, the relative error curve for CGEnKF approaches that of CGVKF, as the theory suggests. Finally, CGEnKF compares favorably to the LBFGSEnKF method of [20]. REFERENCES1. Kalman, R. E., A new approach to linear filtering and prediction problems, Trans. ASME–J. Basic Eng., Ser. D, 82:35–45, 1960. 2. Welch, G. and Bishop, G., An introduction to the Kalman filter, UNC Chapel Hill, Dept. of Computer Science Tech. Report, 95–041, 1995. 3. Cane, M. A., Miller, R. N., Tang, B., Hackert, E. C., and Busalacchi, A. J., Mapping tropical Pacific sea level: Data assimilation via reduced state Kalman filter, J. Geophys. Res., 101:599–617, 1996. 4. Dee, D. P., Simplification of the Kalman filter for meteorological data assimilation, Q. J. R. Meteor. Soc., 117:365–384, 1990. 5. Fisher, M., Development of a simplified Kalman filter, Technical Memorandum 260, European Center for Medium Range Weather Forecasting, 1998. 6. Gejadze, I. Y., Le Dimet, F.X., and Shutyaev, V., On analysis error covariances in variational data assimilation, SIAM J. Sci. Comput., 30:1847–1874, 2008. 7. Fisher, M. and Andersson, E., Developments in 4Dvar and Kalman filtering, Technical Memorandum 347, European Center for Medium Range Weather Forecasting, 2001. 8. Auvinen, H., Bardsley, J. M., Haario, H., and Kauranne, T., Largescale Kalman filtering using the limited memory BFGS method, Electr. Trans. Numer. Anal., 35:217–233, 2009. 9. Auvinen, H., Bardsley, J. M., Haario, H., and Kauranne, T., The variational Kalman filter and an efficient implementation using limited memory BFGS, Int. J. Numer. Methods Fluids, 64(3):314–335, 2010. 10. Veersé, F., Variablestorage quasiNewton operators for modeling error covariances, Proc. of the Third WMO International Symp. on Assimilation of Observations in Meteorology and Oceanography, 7–11, Quebec City, Canada, 1999. 11. Bardsley, J. M., Parker, A., Solonen, A., and Howard, M., Krylov space approximate Kalman filtering, Numer. Linear Algebra Appl., DOI: 10.1002/nla.805, 2011. 12. Zupanski, M., Maximum likelihood ensemble filter: Theoretical aspects, Mont. Weather Rev., 133:1710–1726, 2005. 13. Evensen, G., Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res., 99:10143–10162, 1994. 14. Evensen, G., Data Assimilation: The Ensemble Kalman Filter, Springer, Berlin, 2007. 15. Sacher, W. and Bartello, P., Sampling errors in ensemble Kalman filtering. Part I: Theory, Mon. Weather Rev., 1136:3035–3049, 2008. 16. Meng, Z. and Zhang, F., Tests of an ensemble Kalman filter for mesoscale and regionalscale data assimilation. Part II: Imperfect model experiments, Mon. Weather Rev., 135:1403–1423, 2007. 17. Li, H., Kalnay, E. and Takemasa, M., Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter. Q. J. R. Meteorol. Soc., 135:523–533, 2009. 18. Whitaker, J. S. and Hamill, T. M., Ensemble data assimilation without perturbed observations, Mon. Weather Rev., 130:1913–1924, 2002. 19. Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza, M., Kalnay, E., Pati, O., and Yorke, J. A., A local ensemble Kalman filter for atmospheric data assimilation, Tellus A, 56:415–428, 2004. 20. Solonen, A., Haario, H., Hakkarainen, J., Auvinen, H., Amour, I., and Kauranne, T., Variational ensemble Kalman filtering using limited memory BFGS, submitted, 2011. 21. Parker, A. and Fox, C., Sampling gaussian distributions in Krylov spaces with conjugate gradients, SIAM J. Sci. Comput., 34(3):B312B334, 2012. 22. Nocedal, J. and Wright, S., Numerical Optimization, Springer, New York, 2000. 23. Watkins, D., Fundamentals of Matrix Computations, 2nd Edition Wiley, New York, 2002. 24. Golub, G. H. and Van Loan, C. F., Matrix Computations, The Johns Hopkins University Press, Baltimore, 3rd ed., 1996. 25. Meurant, G., The Lanzcos and Conjugate Gradient Algorithms, SIAM, Philadelphia, 2006. 26. Rue, H. and Held, L., Gaussian Markov Random Fields: Theory and Applications, Chapman & Hall/CRC, London, 2005. 27.Parlett, B., Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs, NJ, 1980. 28. Saad, Y, Numerical Methods for Large Eigenvalue Problems, SIAM, Philadelphia, 2011. 29. Sleijpen, G. L. C. and Van Der Sluis, A., Further results on the convergence behavior of conjugategradients and Ritz values, Linear Alg. Appl., 246:233–278, 1996. 30. Morgan, R. B., Computing interior eigenvalues of large matrices, Linear Alg. Appl., 154156:289–309, 1991. 31. Paige, C., Parlett, B. N., and van der Vosrst, H. A., Approximate solutions and eigenvalue bounds from Krylov subspaces, Num. Linear Alg. Appl., 2(2):115–133, 1995. 32. Sleijpen, G. and van der Eshof, J., Accurate approximations to eigenpairs using the harmonic RayleighRitz method, Preprint 1184, Department of Mathematics, University of Utrecht, 2001. 33. Lorenz, E. N., Predictability: A problem partly solved, Proc. of the Seminar on Predicability, European Center on Medium Range Weather Forecasting, 1:1–18, 1996. 34. Lorenz, E. N. and Emanuel, K. A., Optimal sites for supplementary weather observations: Simulation with a small model, J. Atmosp. Sci., 55:399–414, 1998. 
Portal Digitalde  Biblioteca Digital  eLibros  Revistas  Referencias y Libros de Ponencias  Colecciones 