You are here: Home Research Web Mining, Text Mining and Web Intelligence

Web Mining, Text Mining and Web Intelligence

.

Members:

PhD's involved: Alí­pio Jorge (FCUP, coord.), Pavel Brazdil (FEP,coord.), Luís Torgo (FCUP), João Cordeiro (UBI), Nuno Escudeiro (ISEP),  Ricardo Campos (IPT), Conceição Rocha.
Collaborators: Carlos Soares, Brett Drury,  Jorge Morais, M Valizadeh, Vitor Costa, André V. Rodrigues.
PhD students: Amir Nabizadeh, João Vinagre,  Luís Trigo,  Nuno Miguel Moniz, Rui Sarmento. 

Summary:

  • Recommendation and Web adaption: Development of tools and algorithms for web recommendation and adaptation. We study recommender systems for binary feedback, exploitation of background and context information for recommender algorithm enhancement and incremental approaches to recommender algorithms based on nearest neighbor and matrix factorization approaches. Combination of usage, content and structure information in recommender models. We use the techniques of collaborative filtering, association rules, among others. We currently have one application in production at the site PalcoPrincipal.pt for the recommendation of musical tracks.
     
  • Information Retrieval and Detection of Networks: Retrieving and extracting and processing relevant information from the web is demanding task, that can only be done with the support of automatic tools. We investigate the use of machine learning and data analysis techniques that would facilitate this process. One particular application area is similarity analysis among researchers' publications, generation of clusters and their characterization by keywords. Two Ph.D. are underway on this topic (L.Trigo, R.Sarmento). We also explore temporal information in web content to associate implicit temporal information with queries.
     
  • Document classification / Information Extraction: In every field the amount of information to be shared has grown exponentially, which in itself justifies the use of automated techniques to extract information. One objective of this area is to extract meaningful and useful information from natural language text or unstructured text. In particular,  we are investigating the problem of extracting specific information about a particular domain (e.g. financial end economic data, bibliography concerned with certain topic etc.).

  • Automatic Summarization of given text. This involves both extractive summarization and simplification of individual sentences. Two Ph.D.'s were completed on the this topic ( J,.Cordeiro, 2011; M Valizadeh, 2015).
     
  • Sentiment analysis: This area involves attributing automatically positive or negative sentiment of text or part of the text. One PhD was completed (Brett Drury, 2015) and two MSc theses were completed in 2015.

  • Web / Content Management Automation: This sub-area deals with the automation of tasks related to web site maintenance, from the point of view of content, structure and presentation. We have developed EdMate, a methodology and tool for assisting editors of content bases, such as web portals. This work has been done in collaboration with the company PortalExecutivo. This tool is based on the analysis of content meta-data and web logs.

.
Projects:


Previous Projects

  • Site-o-Matic - Web Site Automation (Project POSI / EIA / 58367/ 2004)

  • Mail-maid: automatic mail classification platform - lead by Nuno Escudeiro and currently with a pilot version running in APPIA web server.
  • VIPAccess, Ubiquitous Web Access for Visually Impaired People (PTDC/PLP/72142/2006) (P.Brazdil)
  • Sumo, Automatic Text Summarization for Mobile technologies (Project POSC/PLP/57438/2004), May 2005 - April 2008 (P.Brazdil)


Publications:

2016

  • Ana Catarina Forte, Pavel B. Brazdil, Determining the Level of Clients’ Dissatisfaction from their Commentaries,  Proc. of PROPOR 2016, Tomar, Springer, Portugal
  • Nuno M Moniz, Francisco Louçã, Márcia Barbosa Oliveira, Renato Araújo Soeiro, Empirical Analysis of the Portuguese Governments Social Network, Social Network Analysis and Mining, 2015.      

2015

  • Leon Derczynski, Jannik Strötgen, Ricardo Campos, Omar Alonso, Special Issue on Time and Information Retrieval, Information Processing & Management, vol.51, no.6, pp.786-890, Novembro, 2015.                                
  • Leon Derczynski, Jannik Strötgen, Ricardo Campos, Omar Alonso, Time and Information Retrieval: Introduction to the Special Issue, Information Processing & Management, vol.51, no.6, pp.786-790, Junho, 2015.       
  • Nuno M Moniz, Luís Torgo, Combining Social and Official Media in News Recommender Systems, ECML-PKDD 2015 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database, Doctoral Consortium, Porto, Portugal, 2015.                               
  • João Vinagre, Alípio Jorge, João Gama, An overview on the exploitation of time in collaborative filtering, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.5, no.5, pp.195-215, Setembro, 2015.             
  • Amir H Nabizadeh, Alípio Jorge, José Paulo Leal, Long term goal oriented recommender systems, WEBIST 2015 - International Conference on Web Information Systems and Technologies, Lisbon, Portugal, Maio, 2015.  
  • André V Rodrigues, Alípio Jorge, Inês Dutra, Accelerating Recommender Systems using GPUs, SAC 2015 - The 30th ACM/SIGAPP Symposium On Applied Computing, Salamanca, Spain, Abril, 2015.                            
  • João Vinagre, Alípio Jorge, João Gama, Collaborative filtering with recency-based negative feedback, SAC 2015 - The 30th ACM/SIGAPP Symposium On Applied Computing, pp.963-965, Salamanca, Spain, Abril, 2015.                
  • Pawel Matuszyk, João Vinagre, Myra Spiliopoulou, Alípio Jorge, João Gama, Forgetting methods for incremental matrix factorization in recommender systems, SAC 2015 - The 30th ACM/SIGAPP Symposium On Applied Computing, pp.947-953, Salamanca, Spain, Abril, 2015. 
  • Conceição N Rocha, Paula Brito, Alípio Jorge, Classificação de Redes Sociais : Uma Abordagem baseada em Distribuições, SPE2015 - XXII Congresso da Sociedade Portuguesa de Estatística, Olhão, Portugal , 2015.        
  • Marcos Domingues, Alípio Jorge, Carlos M Soares, Solange Rezende, Web Mining for the Integration of Data Mining with Business Intelligence in Web-Based Decision Support Systems, Integration of Data Mining in Business Intelligence Systems, IGI Global, 2015.        
  • Cláudia Dias, Paula Brito, Conceição Nunes Rocha, Por Dentro das Notícias: Análise de Dados Textuais, SPE2015 - XXII Congresso da Sociedade Portuguesa de Estatística, Olhão, Portugal , Outubro, 2015.
  • L Trigo, M Víta, R Sarmento and P Brazdil: Retrieval, Visualization and Validation of Affinities between Documents, in Proc. of KITA-2015. Will be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
  • M Valizadeh, P Brazdil, Exploring actor–object relationships for query-focused multi-document summarization, Soft Computing 19 (11), 3109-3121
  • M Valizadeh, P Brazdil, Density-based graph model summarization: Attaining better performance and efficiency, Intelligent Data Analysis 19 (3), 617-629
  • P Brazdil, L Trigo, J Cordeiro, R Sarmento, M Valizadeh, Affinity mining of documents sets via network analysis, keywords and summaries, Oslo Studies in Language 7 (1), https://www.journals.uio.no/index.php/osla/article/view/1456/1353
  • L Trigo, P Brazdil, Affinity Analysis between Researchers using Text Mining and Differential Analysis of Graphs, ECML/PKDD 2014 PhD session Proceedings, 169-176

 2014

  • Ricardo Campos, Gaël Dias, Alípio Jorge, Célia Nunes, GTE-Rank: Searching for Implicit Temporal Query Results, CIKM 2014 - 23rd ACM International Conference on Information and Knowledge Management, pp.2081-2083, Shanghai, China, Novembro, 2014.                   
  • Ricardo Campos, Gaël Dias, Alípio Jorge, Célia Nunes, GTE-Cluster: A Temporal Search Interface for Implicit Temporal Queries, ECIR 2014 - 36th European Conference on Information Retrieval, LNCS, vol.8416, pp.775-779, Amsterdam, The Netherlands, Abril, 2014.            
  • João Vinagre, Alípio Jorge, João Gama, Evaluation of recommender systems in streaming environments, REDD 2014 - ACM RecSys Workshop on Recommender Systems Evaluation: Dimensions and Design (in conjunction with RecSys 2014), Foster City, CA, USA, Outubro, 2014.   
  • Conceição Rocha, Alípio M. Jorge, Márcia Oliveira, Paula Brito, João Gama, Carlos Pimenta, Text mining a Portuguese book on Freemasonry:Disclosing network communities' features, INFORUM 2014 - Simpósio de Informática, Porto, Portugal, Setembro, 2014.    
  • Alípio Jorge, José Paulo Leal, Sarabjot Anand, Hugo Dias, A study of machine learning methods for detecting user interest during web sessions, IDEAS 2014 - 18th International Database Engineering and Applications Symposium, pp.149-157, Porto, Portugal, Julho, 2014.      
  • João Vinagre, Alípio Jorge, João Gama, Fast incremental matrix factorization for recommendation with positive-only feedback, UMAP 2014 - 22nd International Conference on User Modeling, Adaptation, and Personalization, LNCS, vol.8538, pp.459-470, Aalborg, Denmark, Julho, 2014.                 
  • Marcos Domingues, Alípio Jorge, Carlos Manuel Soares, Solange Rezende, Improving Multidimensional Recommender Systems Using Dimensions as Virtual Items, SEMISH 2014 - XLI Seminário Integrado de Software e Hardware (at XXXIV Congresso da Sociedade Brasileira de Computação), Brasília, Brazil, Julho, 2014.                                                                        
  • Catarina Félix, Carlos Soares, Alípio Jorge, João Vinagre, Monitoring Recommender Systems: a Business Intelligence Approach, ICCSA 2014 - 14th International Conference on Computational Science and Its Applications, LNCS, vol.8584, pp.277-288, Guimarães, Portugal, Junho, 2014. 
  • Marcos Domingues, Carlos Manuel Soares, Alípio Jorge, Solange Rezende, A data warehouse to support web site automation , Journal of the Brazilian Computer Society, vol.20, no.11, pp.1-16, Abril, 2014.                                            
  • Ricardo Campos, Gaël Dias, Alípio Jorge, Célia Nunes, GTE-Cluster: A Temporal Search Interface for Implicit Temporal Queries, ECIR 2014 - 36th European Conference on Information Retrieval, LNCS, vol.8416, pp.775-779, Amsterdam, The Netherlands, Abril, 2014.
  • Ana Carneiro, Alípio Jorge, Pedro Brito, Marcos Domigues, Measuring the Effectiveness of an E-Commerce Site Through Web and Sales Activity, Modeling, Dynamics, Optimization and Bioeconomics I, Springer Proc. in Mathematics & Statistics, vol.73, pp.149-162, 2014.
  • M. Valizadeh, Improving the Performance of Text Summarization, PhD thesis, MAPi/FCUP, 2014.
  • Vitor Costa: Update Summarization, MSc Thesis, MADSAD, 2014.
  • Nuno Moniz and Luis Torgo (2014): Improvement of News Ranking through Importance Prediction, proceeding of KDD'2014 workshop NewsKDD - Data Science for News Publishing. DOI: 10.13140/2.1.4035.3282
  • Nuno Moniz, Luis Torgo and Fátima Rodrigues (2014): Resampling approaches to improve news importance prediction, in Advances in Intelligent Data Analysis XIII (IDA'2014), Blockeel et. al. (eds.), pp. 215-226, LNCS vol. 8819, Springer
  • Ricardo Campos, Gaël Dias, Alípio M. Jorge, and Adam Jatowt. 2014. Survey of Temporal Information Retrieval and Related Applications. ACM Comput. Surv. 47, 2, Article 15 (August 2014), 41 pages. DOI=10.1145/2619088 http://doi.acm.org/10.1145/2619088.
  • M. Valizadeh, P. Brazdil, Unsupervised Method for Re-ranking the Initial Retrieval Results of IR Based on a Query-Sensitive Similarity Measure, to appear in Journal of Research and Practice in Information Technology, 2014.
  • M. Valizadeh, Pavel Brazdil, Exploring Actor-Object Relationships for Query-focused Multi-Document Summarization, Soft Computing, 2014. DOI 10.1007/s00500-014-1471-x SpringerLink

   2013

  • Marcos Domingues, Carlos Soares, Alípio Jorge: Using statistics, visualization and data mining for monitoring the quality of meta-data in web portals. Inf. Syst. E-Business Management 11(4): 569-595 (2013).
  • Marcos Domingues, Alípio Jorge, Carlos Soares, Dimensions as Virtual Items: Improving the predictive ability of top-N recommender systems, Information Processing & Management, Volume 49, Issue 3, May 2013, Pages 698-720.
  • Marcos Domingues, Fabien Gouyon, Alípio Jorge, José Paulo Leal, João Vinagre, Luís Filipe Lemos, Mohamed Sordo, Combining usage and content in an online recommendation system for music in the Long Tail, International Journal of Multimedia Information Retrieval, vol.2, no.1, pp.3-13, March, 2013.
  • João Cordeiro, Gael Dias, and Pavel Brazdil: Rule Induction for Sentence Reduction, Proc. of EPIA 2013, Springer, 2013
  • M. Valizadeh, P. Brazdil, Density-Based Graph Model for Multi-Document Summarization, EPIA2013, Açores, Portugal, 2013.

   2010-2012

  • Drury,B., Torgo,L. and Almeida, J.J. (2012): Classifying News Stories with a Constrained Learning Strategy to Estimate the Direction of a Market Index International Journal of Computer Science & Applications, vol. 9 - 1, pp. 1-22. Technomathematics Research Foundation, ISSN 0972-9038.
  • Drury,B. and Dias,G. and Torgo,L. (2011): Contextual Classification Strategy for Polarity Classification of Direct Quotations from Financial News, in International Conference On Recent Advances in Natural Language Processing (RANLP 2011). Hissar, Bulgaria, September 12-14.
  • Gintare Grigonyte, João Paulo Cordeiro, Gaël Dias, Rumen Moraliyski, Pavel Brazdil: Paraphrase Alignment for Synonym Evidence Discovery, COLING 2010, Proceedings of the 23rd International Conference on Computational Linguistics, page 403-411 - August 2010 [DBLP].

    2008-2009

  • João P. Cordeiro, Gaël Dias, Pavel Brazdil: Unsupervised induction od sentence compression rules, Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009).
  • Ana Catarina Miranda, Alipio M. Jorge, Item-Based and User-Based Incremental Collaborative Filtering for Web Recommendations Progress in Artificial Intelligence, Proceedings of the 14th Portuguese Conference on Artificial Intelligence (EPIA 2009), Springer LNCS Volume 5816, page 673--684 - October 2009.
  • Nuno Escudeiro, Alipio M. Jorge, Efficient Coverage of Case Space with Active Learning, Progress in Artificial Intelligence, Proceedings of the 14th Portuguese Conference on Artificial Intelligence (EPIA 2009), Springer LNCS Volume 5816, page 411--422 - October 2009.
  • Ana Catarina Miranda, Alipio M. Jorge, Incremental collaborative filtering for binary ratings, Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Volume 1, page 389--392 - December 2008
  • Marcos Aurélio Domingues. An independent platform for the monitoring, analysis and adaptation of web sites. Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys 2008), page 299--302 [ISI, DBLP].

Publications in previous years


PhD Theses Completed:

  • M Valizadeh, Improving the Performance of Text Summarization, FCUP, Univ. Porto, MAPi, 2014, superv. P.Brazdil.
  • Brett Drury, A Text Mining System for Evaluating the Stock Market's Response To News, Univ. Porto, 2013, superv. L.Torgo
  • Bruno Nogueira, Análise de agrupamentos ativo e semi-supervisionado, Universidade de São Paulo, Brasil e Universidade do Porto, 2013, supervisors Solange Rezende (USP), Alípio Jorge.
  • Ricardo Campos,  Disambiguating Implicit Temporal Queries for Temporal Information Retrieval Applications, Faculdade de Ciências da Universidade do Porto, sup. Gael Dias (U. Caen, Fr), Alípio Jorge, 2013.
  • Marcos Aurélio Domingues, "An independent platform for monitoring, analyzing and adapting web sites", Universidade do Porto, 2010, supervisors Alípio Jorge, Carlos Soares.
  • Nuno Escudeiro, “Semi-automatic classification: using active learning for efficient class coverage”,  Universidade do Porto, 2012, supervisors Alípio Jorge, Rui Camacho.
  • Mário Amado Alves, “Adaptive Hypertext: The shattered document approach", Universidade do Porto. supervisors Alípio Jorge, José Paulo Leal, 2013.
  • António Jorge do Nascimento Morais, “A Multi-Agent approach for Web Adaptation”, Universidade do Porto, supervisors Alípio Jorge and Eugénio Oliveira (FEUP), 2013.
  • João Cordeiro: Rule induction for Sentence Reduction, Ph.D., UBI, 2011 (superv. G.Dias, P.Brazdil).
  • Marcos Aurélio Domingues, Exploiting Multidimensional Data for Web Site Automation, PhD in Computer Science, University of Porto. (Sup. Alípio Jorge and Carlos Soares).

.
PhD Thesis in Progress:

  • Luis Trigo, Estudo de Comunidades de Investigadores com Recurso à Text Mining em Bases de Dados Bibliográficas, Ph.D. course Human Language Technologies, FLUP (superv. P.Brazdil)
  • João Vinagre,  (superv. A.Jorge)
  • Rui Sarmento,  Scaling-up Retrieval, Visualization and Validation of Similarities among Document Sets, PRODEI, FEP (superv. P.Brazdil, J.Gama)
     

MSc Theses

Organized Events

Colaborations:

Software and Datasets: