You are here: Home Research Data Analysis and Statistical Methods

Data Analysis and Statistical Methods

.

Members:

People involved: Paula Brito (coord.), Adelaide Figueiredo, Sónia Dias.
Colaboradores:  Carlos Soares, Helena Brás Silva, José G. Dias, Paulo Teles, Pedro Duarte Silva.
PhD Students: Paula Cheira, Hélder Alves, Cristina Lucas.

Summary:

Symbolic Data Analysis. The analysis of complex data, referred to as symbolic data, allows summarizing the complex/structured information (resulting, e.g. from aggregation of huge data sets), interpreting results as concepts of higher order. We are interested in multivariate analysis of symbolic data; in particular in clustering, discriminant analysis and time series analysis of interval data.

Clustering based on Graph Theory. Development of clustering methods based on graph theory. This involves the study of some graph theory concepts, in particular, the graph coloring problem and some graph k-partite properties. We further explore the validity of classifications, taking into account the lack of knowledge about the data set.

Probabilistic Methods in Clustering. We study non-hierarchical clustering methods based on probabilistic measures.

Weighted Measures in Classification. Development of new measures of association, including weighted measures, to solve some problems in classification. This led also to the development of weighted principal component analysis and its application in bioinformatics.

Classification of Ordinal Data. We aim to develop new methods for supervised classification of ordinal data, where the order between the classes is taken into account.

Distribution on the Hypersphere. We have developed some new results on Statistical Inference based on a distribution defined on the hypersphere for axial data, including goodness-of-fit tests for the Watson distribution defined on the hypersphere and the two-way Analysis of Variance for this distribution.

Analysis of Three-Way Data. We use some three-way methods of Data Analysis like the Statis method and the Multiple Factorial Analysis with applications to real data.

Selected papers:

  • Brito, P: Symbolic Data Analysis: Another look at the interaction of Data Mining and Statistics, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 4 (4), 15pp., 2014.
  • Figueiredo, AMS: Goodness-of-fit for a concentrated von Mises-Fisher distribution, Computational Statistics, Vol. 27 (1), 14pp., 2012.
  • Brito, P; Silva, APD: Modelling interval data with Normal and Skew-Normal distributions, Journal of Applied Statistics, Vol. 39 (1), 18pp., 2012.
  • Brito, P; Chavent, M: Divisive monothetic clustering for interval and histogram-valued data, ICPRAM 2012 - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, Vol. 1, 6pp., 2012.
  • Brito, P; Polaillon, G: Conceptual clustering with generalization by intervals | Classification Conceptuelle avec Généralisation par Intervalles, Revue des Nouvelles Technologies de l'Information, Vol. E.23, 6pp., 2012.
  • Noirhomme-Fraiture, M; Brito, P: Far Beyond the Classical Data Models: Symbolic Data Analysis, Statistical Analysis and Data Mining, Vol. 4 (2), 14pp., 2011.
  • Figueiredo, A: Discriminant Analysis for the von Mises-Fisher Distribution, Communications in Statistics-Simulation and Computation, Vol. 38 (9), 13pp., 2009.
  • Figueiredo, A: Multi-sample tests for axial data from Watson distributions, Asta-Advances in Statistical Analysis, Vol. 93 (4), 16pp., 2009.
  • Figueiredo, A: Two-way ANOVA for the Watson distribution defined on the hypersphere , Statistical Papers, Vol. 49 (2), 14pp., 2008.
  • Figueiredo, A: Comparison of tests of uniformity defined on the hypersphere, Statistics & Probability Letters, Vol. 77 (3), 6pp., 2007.
  • Figueiredo, A: Discordancy tests based on the likelihood ratio for the bipolar Watson distribution on the hypersphere, Communications in Statistics-Simulation and Computation, Vol. 36 (2), 9pp., 2007.
  • Figueiredo, A: Multi-sample likelihood ratio tests based on bipolar Watson distributions defined on the hypersphere, Communications in Statistics-Theory and Methods, Vol. 36 (1-4), 6pp., 2007.
  • Brito, P: Modelling and analysing interval data, 30th Annual Conference of the German-Classification-Society, 12pp, 2007.
  • Santos, LD; Martins, I; Brito, P: Measuring subjective quality of life: A survey to Porto's residents, Applied Research in Quality of Life, Vol. 2 (1), 14pp., 2007.
  • Brito, P; Noirhomme-Fraiture, M: Symbolic and spatial data analysis: Mining complex data structures, Intelligent Data Analysis, Vol. 10 (4), 4pp., 2006.
  • Campos, P; Brazdil, P; Brito, P: Organizational survival in cooperation networks: The case of automobile manufacturing, 7th Working Conference on Virtual Enterprises, Vol. 224, 8pp., 2006.
  • Brás Silva, H., Brito, P.; Pinto da Costa, J., "A partitional clustering algorithm validated by a clustering tendency index based on graph theory". Pattern Recognition, 39(5), 2006.
  • Figueiredo, A.; Gomes, P. "Discordancy test for the bipolar Watson distribution defined on the hypersphere". Communications in Statistics, Simulation and Computation, 34: 1, 2005.
  • Figueiredo, A; Gomes, P. "Goodness-of-fit methods for the bipolar Watson distribution defined on the hypersphere". Statistics and Probability Letters, 76/2, Elsevier, 2006.
  • Duarte Silva, A. P.; Brito, P., "Linear Discriminant Analysis for Interval Data", Computational Statistics, 21(2), pp. 289 - 308 , 2006.
  • De Carvalho, F.; Brito, P.; Bock, H.-H., "Dynamic Clustering for Interval Data Based on L_2 Distance", Computational Statistics, 21(2), pp. 231-250, 2006.
  • Figueiredo, A. "Two-way analysis of variance for data from a concentrated bipolar Watson distribution", Journal of Applied Statistics, 33(6), pp. 575-581, 2006.
  • Figueiredo, A.; Gomes, P. "Performance of the EM algorithm on the identification of a mixture of Watson distributions defined on the hypersphere", REVSTAT.