Real-timE ModellIng of Nonlinear Data streams

The project Real-timE ModellIng of Nonlinear Data Streams (REMIND) had at its core the aim of melding advances in nonlinear dynamics with those of Bayesian statistical estimation; this was attempted in both mathematical systems and real industrial settings [2]. In each case, we maintained the realism of a data stream setting: a situation in which real-time reaction is required, and where data is provided at a rate that precludes complete reliance on online storage and post processing. The industrial data streams considered under REMIND consisted of (i) second by second observations of variations in grid frequency of the UK electricity grid and minute by minute variations in electricity demand, both provided by National Grid Transco (see [2] for details) and (ii) multivariate observations from a Hurricane Gas Turbine in various states of spin-up, spin-down and steady load states, collected by Dave Mellor of Intertec during a site visit to Siemens-ALSTOM labs in Lincoln with both the PI and RA attending (see [1] for details). Data streams for mathematical models came from a collection of low dimensional nonlinear deterministic models (see [4, 5, 6, 8] and nonlinear stochastic models (see [2,7]) with which the investigators have familiarity. Early in the project, connections within the energy sector led to an additional industrial application on forecasting electricity demand; while not a data stream application, this work led to new insights on the Bayesian methods of processing multi-model forecasts [3].

The first step was to develop the means for evaluating the relevance of model classes, for a given problem, and improved means of building deterministic nonlinear models for data streams. A new test for long-range persistence was developed (see [7]); passing this test would suggest using a model class that permits slowly decaying correlation functions. Despite widespread claims of long-range persistence, we found little indication that estimates of persistence were robust either in the grid data, or in other geophysical series analyzed by others. In order to apply traditional nonlinear local polynomial model techniques to data streams, we developed a method of selective incorporation [5 ] which has proven very effective for modelling chaotic systems. Neither Intertec nor NG Transco are interested in prediction per se, but rather in condition monitoring and parameter estimation, respectively. Condition monitoring requires the (early) detection of changes in operating characteristics. Under REMIND we have extended and applied a technique previously suggested by the PI in un-refereed literature [8 ] which uses low-dimensional reconstructions, with the ability to incorporate the forecast models of [5 ]. Much modern condition monitoring is based on frequency space techniques, using Fourier methods to detect changes in observed spectra; proposed nonlinear methods are often based upon the estimation of scaling statistics which are quite difficult in practice (see [7 ] and reference therein). The method developed by Clarke and Smith [1 ] under REMIND can detect and quantify changes that are literally invisible in frequency space, hence the title "Detecting Transparent Noise". The technique was illustrated on data collected jointly with Intertec. Intertec is a small to medium size engineering company who want to extend their monitoring techniques in order to gain market differentiation. Our analysis technique complements existing frequency-based methods and can be applied using existing monitoring technologies. In order to prove its practical benefit in fault detection we would need to test the method on data for which a known fault had occurred; such data is not yet available. The method does successfully identify changes in Hurricane data stream as the turbine is placed in different settings. The development of both predictive monitoring [5 ] and data-library based monitoring [1 ] allows us to adjust to Intertec's operational constraints (the computing and data storage limits of their detector). In short, our method is designed to complement existing frequency-based methods (and likely to replace the alternative nonlinear methods noted in [1 ]) and to extend the usefulness of condition monitoring with particular applications to data streams.

A major thrust of the REMIND project was the development of a Bayesian parameter estimation methodology, using MCMC methods, for both deterministic and stochastic models. A number of fairly fundamental difficulties with proposed applications have been documented under REMIND [9 ] and by others during this time [4 ]. Under REMIND we developed a simplified model of the UK's electricity grid [2 ] and went about parameter estimation using that model. When run in perfect model mode, with high sampling rates on the observable quantities, the MCMC approach worked well. When run using current operational sampling rates (a few hertz for grid frequency, around 1 to 15 minutes for demand), the method did not converge; more precisely: common evaluation statistics of the MCMC method did stabalise but not about physically plausible parameter values . A good deal of time was spent investigating methods to apply the MCMC algorithm, to determine whether the problems were (a) in the MCMC method, (b) in the fact that the mathematical structure of our model was imperfect, or (c) that the quality of the operational data stream failed to provide the information required by our MCMC algorithm. In short, the answer was (c) as demonstrated in reference [2 ], where we not only address particular questions in the analysis of grid frequency, but also suggest sanity tests for establishing the meaningful convergence of MCMC analyses of time series in general.

Joint research continues with NGT, the possibility of acquiring data at higher sampling rates in the future is a real one; the research into MCMC modelling has attracted the interest of other Bayesian modelling groups (Jim Berger at Duke) and is likely to lead to a better understanding of the application of these methods to simulation models with a dominant deterministic component. Other examples are in preparation [9 ]; the combination of MCMC and alternative methods of parameter estimation will form the core of M Cuellar's PhD thesis which will be submitted by the end of 2005.

[1] L. Clarke and L.A. Smith (2005) Detecting transparent noise. Mechanical systems and signal processing, Elsevier. (In Review)

[2] M. Cuellar, L. Clarke, M. Brown, and L.A. Smith (2005) The Role of Operational Constraints on MCMC Parameter Estimation: The case of the UK Electricity Grid. International Journal of Power Engineering and Energy Systems, Elsevier (In Review).

[3] L.A. Smith, M.G. Altalo & C. Ziehmann (2004) Predictive Distributions from an Ensemble of Forecasts: Electricity Demand Forecasts from Imperfect Weather Models. Physica D (accepted with minor revisions)

[4] K. Judd and L.A. Smith (2004) Indistinguishable states II: The imperfect model scenario. Physica D, 196:224-242. K. Judd (2003) Right Results for the Wrong Reasons? Physical Rev E, 67, 026212.

[5] F. Kwasniok and L.A. Smith (2004) Model reconstruction from data streams. Phys. Rev. Lett., 92(16).

[6] L.A. Smith, M. Cuellar, H. Du and K. Judd (2005) A Geometric Approach to Parameter Estimation in Nonlinear Systems. Phys Rev Lett (In Review).

[7] L.A. Smith and A. Guerrero (2005) A Maximum Likelihood estimator for Quantifying Long-range Persistence. Physica A, 355 619-632.

[8] L.A. Smith, K.Godfrey, P.Fox, and K. Warwick. A new technique for fault detection in multi-sensor probes. In IIE International Conference on Control `91, volume 2, page 1062, 1991.

REMIND papers still in Preparation

[9] M. Cuellar and L.A. Smith (2005) On the Curious Behavior of MCMC Parameter estimation in the Logistic Map (to be submitted to Phys Lett A)

[10] L.A. Smith H. Du (2005) Using ISIS to Resolve the Problem of Chaotic Likelihoods (to be submitted to Physica A)