Suscripción a Biblioteca: Guest
Portal Digitalde Biblioteca Digital eLibros Revistas Referencias y Libros de Ponencias Colecciones
Journal of Automation and Information Sciences
SJR: 0.232 SNIP: 0.464 CiteScore™: 0.27

ISSN Imprimir: 1064-2315
ISSN En Línea: 2163-9337

Volumes:
Volumen 51, 2019 Volumen 50, 2018 Volumen 49, 2017 Volumen 48, 2016 Volumen 47, 2015 Volumen 46, 2014 Volumen 45, 2013 Volumen 44, 2012 Volumen 43, 2011 Volumen 42, 2010 Volumen 41, 2009 Volumen 40, 2008 Volumen 39, 2007 Volumen 38, 2006 Volumen 37, 2005 Volumen 36, 2004 Volumen 35, 2003 Volumen 34, 2002 Volumen 33, 2001 Volumen 32, 2000 Volumen 31, 1999 Volumen 30, 1998 Volumen 29, 1997 Volumen 28, 1996

Journal of Automation and Information Sciences

DOI: 10.1615/JAutomatInfScien.v51.i8.60
pages 70-80

Estimate of Time Series Similarity Based on Models

Tatyana V. Knignitskaya
Yuriy Fedkovych Chernovtsy National University, Chernovtsy

SINOPSIS

Determining the measure as a distance between time series is a starting point for many data mining tasks such as clustering and classification. Clustering is a main method of teaching without a teacher, which is used to divide data into groups based on the internal and a priori unknown characteristics inherent in the data. When dividing data into clusters, the need arises to select the similarity metric between objects. The paper describes the main existing algorithms for the "distance" searching between time series, which describe well this problem for small time series and under the absence of outliers. Outliers inherent in real processes lead to improper clustering, and, consequently, to wrong decisions making. It is proposed to consider the distance between time series in the form of the distance between models (ARIMA) of these time series. In the presence of a large number of outliers, classical methods linearly increase the distances between time series, while the distance proposed in the article according to the models behaves as a logarithmic function. It is shown that with an increase in the number of measurements, the relative errors for all classical methods remain almost unchanged. At the same time, the relative error for estimating the distance by the models is much smaller and decreases with an increase in the number of measurements. The main achievement of the article is the determination of the distance between time series, based on the concept of a model, and the comparison of this distance with the corresponding classical methods most commonly used. Using the Monte Carlo method, it has been shown that the proposed distance is more resistant to outliers and gives more accurate results for time series with a large number of observations. In addition, the complexity of the algorithm for calculating distances based on models is less than the analogous computational complexity of existing algorithms (DTW, ERP, Euclidean distance). There is no doubt that the use of models is one of the most convenient tools for studying the similarity of processes. In addition, for analysis taking into account this algorithm, it is convenient to use the averaged evolutions and the limiting evolutions in the diffusion approximation scheme. Also, due to the resistance to outliers of limiting evolutions, the entered distance can be used in clustering to build more noise-resistant clusters.

REFERENCIAS

  1. Aue A., HorvathL., Structural breaks in time series, Journal of Time Series Analysis, 2013, 34, No. 1, 1-16. .

  2. KeoghE., ZhuQ., Hu B., HaoY., XiX., WeiL., Ratanamahatana CA: The UCR Time series classification/clustering homepage, 2011. .

  3. Akaike H., Time series analysis and control through parametric models, Applied Time Series Analysis, Academic Press, New York, 1978. .

  4. Shapiro S.S., FranciaR.S., An approximate analysis of variance test for normality, J. Amer. Stat. Assoc., 1972, 67, 215-216. .

  5. MagidsonJ., Vermunt J.K., Latent class factor and cluster models, bi-plots and related graphical displays, In Sociological Methodology, Blackwell, Cambridge, UK, 2001. .

  6. Liao T.W., Clustering of time series data  A survey, Pattern Recognit., 2005, 1857-1874. .

  7. Kantorovich G.G., Analysis of time series (lecture course), Ekonomicheskiy zhurnal VSHE, 2003, 1(202), 85-116. .

  8. Rodgers J.L., Nicewander W.A., Thirteen ways to look at the correlation coefficient, Am. Stat., 1988, 42, 59-66. .

  9. Maesschalck R.D., Jouan-Rimbaud D., MassartD., The mahalanobis distance, Chemom. Intell. Lab. Syst, 2000, 50, 1-18. .

  10. Felix Iglesias, Wolfgang Kastner, Analysis of similarity measures in times series clustering for the discovery of building energy patterns, Energies, 2013, 6(2), 579-597. .

  11. Yam Khoon, LI, L2, Kalman filter and time series analysis in deformation analysis, Singapore. .

  12. BerndtD.J., Clifford J., Using dynamic time warping to find patterns in time series, In AAA1-94 Workshop on Knowledge Discovery in Databases, 1994, 359-370. .

  13. VlachosM., Kollios G., GunopulosD., Discovering similar multidimensional trajectories, ICDE, 2002, 673-684. .

  14. StramerO., Brockwell P.J., TweedieR.L, Existence and stability of continuous time threshold ARMA processes, Statistica Sinica, 1995. .

  15. Lei Chen, Raymond Ng., On the Marriage of Lp-norms and Edit Dislance, In VLDB, 2004, 792-803. .

  16. Lei Chen, M. Tamer Ozsu, Vincent Oria, Robust and fast similarity search for moving object trajectories, In SIGMOD, 2005, 491-502. .

  17. Pole A., West M., Harrison J., Applied Bayesian forecasting and time series analysis, Chapman and Hall, New York, 1994. .

  18. Yildirim Ilker, Bayesian inference: Gibbs sampling, Department of Brain and Cognitive Sciences, University of Rochester, August, 2012, 1-6. .

  19. Lipkus A., A proof of the triangle inequality for the Tanimoto distance, J. Math. Chem., 1999, 26, 263-265. .

  20. Usue Mori, Mendiburu A., Lozano J., Distance measures for time series in R: The TSdist Package, The R Journal, 2016, 8(2), 451-459. .

  21. Luis E. Nieto-Barajas, Alberto Contreras-Crist, A Bayesian nonparamctric approach for time series clustering, Bayesian Analysis, 2014, 9, 147-170. .

  22. Brockwell P.J., Davis R.A., Introduction to time series and forecasting, Springer, NY, 2012. .

  23. Dick J., Kuo F.Y., Peters G.W., Sloan I., Monte Carlo and quasi-Monte Carlo methods 2012, Springer, NY, 2014. .

  24. Casini A., Structural breaks in time series, Oxford Research Encyclopedia of Economics and Finance, 2017, 1-38. .

  25. Tsarkov Ye.F., Yasinsky V.K., Malyk I.V., Stability in impulsive systems with Markov perturbations in averaging scheme, 2. Averaging principle for impulsive Markov systems and stability analysis based on averaged equations, Cybernetics and Systems Analysis, 2011, 47, 44-54. .

  26. Argiento R., Cremaschi A., Guglielmi A., A Baycsian nonparamctric mixture model for cluster analysis, Technical report Quadernolmati CNR, 3-MI, Milano, 2012. .

  27. BerndtD.J., Clifford J., A dynamic programming approach, In Advances in Knowledge Discovery and Data M,n,ng, 1996, 229-248. .

  28. Brockwell P.J., Continuous-time ARMA processes, Handbook of Statistics, Elsevier, Amsterdam, 2001, 19, 249-276. .

  29. Brockwell P.J., On continuous time threshold ARMA processes, Journal of Statistical Planning and Inference, 1994, 39, No. 2, 291-303. .

  30. Davis R.A., Dunsmuir W.T., Maximum likelihood estimation for MA(1 ) processes with a root on or near the unit circle, Econometric Theory, 1996, 12, 1-29. .

  31. FoxE., SudderthE.B., JordanM.I., WillskyA.S., Bayesian nonparametric inference of switching dynamic linear models, IEEE Transactions on Signal Processing, 2011, 59, 1569-1585. .

  32. Ghosh A., Mukhopadhyay S., Roy S., Bhattacharya S., Bayesian inference in nonparametric dynamic state-space models, 2012. .

  33. GrayH.L., Kelley G.D., McIntire D.D., A new approach to ARMA modeling, Comm. Stat., 1978, B7, 1-77. .

  34. 34. Grunwald G.K., HyndmanR.J., HamzaK., Some properties and generalizations of nonnegative Bayesian time series models, Technical Report, Statistics Dept., 1994. .

  35. Huang A., Similarity measures for text document clustering, Proceedings of the 6th New Zealand Computer Science Research, Student Conference, Christchurch, New Zealand, 14-18 April, 2008, 49-56. .

  36. KeoghE., Exact indexing of dynamic time warping, Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 20-23 August, 2002, 406-417. .

  37. LijoiA., MenaR.H., Controlling the reinforcement in Bayesian nonparametric mixture models, Journal of the Royal Statistical Society, 2007, Series B 69, 715-740. .

  38. Navarro D., Perfors A., The Metropolis-Hastings algorithm, COMPSCI 3016, Computational Cognitive Science, 2012. .

  39. Pitman J., Yor M., The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator, The Annals of Probability, 1997, 25, 855-900. .

  40. Zhou C., Wakefield J., A Bayesian mixture model for partitioning gene expression data, Biometrics, 2006, 62, 515-525. .