
| Site-O-MaticWeb Site Automation |
Financially supported by FCT Fundação para a Ciência e Tecnologia through grant POSC/EIA/58367/2004, co-financed by FEDER through Programa Operacional Sociedade do Conhecimento POS_Conhecimento.
The Web currently poses a number of interesting research problems. From the User’s point of view (see Roles below), the Web is becoming too large, too dynamic and increasingly unknown. From the Editor’s point of view, the Web is a constant demand for new information and timely updates. Moreover, the Editor should not only maintain the contents of the site, but also permanently choose the site’s navigational structure that best helps achieving the aims of the site’s Owner, User, or both. From the Owner’s point of view the need for such a constant labour intensive effort implies very high financial or personal costs.
In this project we aim to develop a platform and a methodology for automating several of the management activities of a Web site, such as the retrieval of new, relevant content, monitoring and management of existing content and how it is structured, provision of recommendation and personalization of the structure. This is done taking into account the behaviour of the Users, and the aims of the Owner. One of the effects of automation is the reduction of the Editor’s effort, and consequently of the costs for the Owner. The other effect is that the site can more timely adapt to the behaviour of the User, improving the browsing experience and helping the user in achieving his/her own goals when these are in accordance to the goals of the Owner of the site.
Our aims will be pursued by: 1) defining a flexible web site platform that allows the acquisition of quality web data, as well as on line automatic transformation and customization of the site’s structure and interface (view); 2) developing techniques for web adaptation using data mining (association rules, collaborative filtering, Bayesian approaches, Markov models, ranking methods); and 3) allowing the specification of topic focused content retrieval. These efforts will be guided by well defined objectives, automatically monitored by measures of performance defined on usage data. Automated actions must also be controlled by the Editor.
As a test bed for our approach, we will integrate some of the ideas on existing web sites, both commercial (a payed subscription portal with information digests and other services for executives, www.PortalExecutivo.com), and non-commercial (www.appia.pt, the site of a scientific society) and will set up another entirely new web site tentatively popular for the widespread dissemination of scientific knowledge.
Partners
Former partners
- PortalExecutivo.com (PE has closed doors)
Publications
Theses
Nuno Escudeiro, Automatic Web resource compilation using data mining , Dissertação de Mestrado em Análise de Dados e Sistemas de Apoio à Decisão, Faculdade de Economia, Universidade do Porto, 2005
2006
2007
MSc Thesis Projects
2007
Papers in Journals or with ISI reference
2005
Carlos Soares, Alípio Jorge, and Marcos Aurélio Domingues, Monitoring the Quality of Meta-Data in Web Portals Using Statistics, Visualization and Data Mining, Proceedings of EPIA 2005, Lecture Notes in Computer Science, Vol. 3808, pp 371 - 382 Dec. 2005
José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining. in Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 05), LNAI 3721, pp 34-45. Porto, Portugal, O
2006
Nuno Escudeiro, Alípio Jorge, "Semi-automatic creation and maintenance of Web Resources with webTopic", in "Semantics, Web and Mining", Edited by Maarten van Someren and Giovanni Semeraro, Lecture Notes in Computer Science, Vol. 4289, 82-102, Springer, 20
2007
J. Borges, M. Levene, "Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions", aceite para publicaçao no IEEE Transactions on Knowledge and Data Engineering, Vol.19, No 4, April 2007
Paulo J. Azevedo, Alípio Jorge: Comparing Rule Measures for Predictive Association Rules, Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Lecture Notes in Computer Science, Vol. 4701, pp. 510-517, Warsaw, Poland, Septembro 2007.
Paulo J. Azevedo, Alípio Jorge: Iterative Reordering of Rules for Building Ensembles Without Relearning, Discovery Science, 10th International Conference, DS 2007, Lecture Notes in Computer Science, Vol. 4755, pp. 56-67, Sendai, Japan, Outubro 2007
Carmen Rebelo, Pedro Quelhas Brito, Carlos Soares, Alípio Jorge, Quantitative Evaluation of Clusterings for Marketing Applications: a Web Portal Case Study, in EPIA 2007, Encontro Português de Inteligência Artificial, Lecture Notes in Computer Science, Dezembro 2007.
J. Borges and M. Levene. "Testing the predictive power of variable history web usage." Journal of Soft Computing - A Fusion of Foundations, Methodologies and Applications, special issue on Web intelligence and change discovery, Volume 11, Number 8, June 2007, pages 717-727
2008
J. Borges and M. Levene. "Variable Length Markov Chains for Web Usage Mining", to appear in Encyclopedia of Data Warehousing and Mining - 2nd Edition
J. Borges and M. Levene. "Mining Users' Web Navigation Patterns and Predicting Their Next Step", to appear in Security Informatics and Terrorism: Patrolling the Web, NATO Science for Peace and Security Series: Information and Communication Security, Volume 15, Edited by: C.S. Gal, P.B. Kantor and B. Shapira, May 2008
Articles in Conferences
Nuno Escudeiro, Alípio Jorge, A methodology for satisfying persistent information needs on the Web using data mining: 6ta Conf. da Assoc. Port. de Sist. de Inf., CAPSI 05, Bragança, Portugal, Outubro 2005
2006
Rebelo, Carmen, Pedro Quelhas Brito, Carlos Soares, and Alípio Jorge, Factor Analysis to Support the Visualization and Interpretation of Clusters of Portal Users, in the 2006 IEEE / WIC / ACM International Conference on Web Intelligence, 18-22 December 2
Carla Carvalho, Alipio M. Jorge, Carlos Soares, Personalization of e-newsletters based on web log analysis and clustering, ACM/IEEE Web Intelligence Conference 2006
2007
Silva, P. and Leal, J. P. (2007). Adaptabilidade Web no Som. In Ramalho, J. C., ao Correia Lopes, J., and Carrico, L., editors, XATA 2007 - XML: Aplicações e Tecnologias Associadas, pages 211–222, Lisboa. Faculdade de Ciências da Universidade de Lisboa.
Nuno Escudeiro, Alípio Jorge. Focused Semi-Automatic Content Retrievel for Persistent Information Needs, SDIA 07, EPIA 07 Simpósio Doutoral em Inteligência Artificial. Guimarães, Portugal. December 2007
Papers in Workshops
Mário Amado Alves, Alípio Jorge Minibrain : a generic model of spreading activation in computers, and example specialisations, in ECML/PKDD 2005 workshop "Subsymbolic paradigms for learning in structured domains", Porto, 2005.
José Borges e Mark Levene. A Clustering-Based Approach for Modelling User Navigation With Increased Accuracy, in Proceedings of the Second International Workshop on Knowledge Discovery from Data Streams (IWKDDS), 77-86. Porto, Portugal, Outubro, 2005.
Campos, R. & Dias, G. (2005). Automatic Hierarchical Clustering of Web Pages. In Proceedings of the ELECTRA Workshop associated to 28th Annual International ACM SIGIR Conference, Salvador, Brazil, August 19. pp. 83-85. In association with ACM editions.
C. Carvalho, A. Jorge and C. Soares (2005). Personalização de e-Newsletters com Base nos Acessos dos Utilizadores. In Proceedings of the Data Mining for Business Workshop. Porto, Portugal
Nuno Escudeiro, Alípio Jorge, webTOPIC: Automatic Web Resource Compilation for Persistent Information Needs, ECML/PKDD 2005 workshop European Web Mining Forum
C. Rebelo, P.Q. Brito (2005). Quantidade de Informação por Detrás de um Clique. In Proceedings of the Data Mining for Business Workshop. Porto, Portugal
Domingues, M. A.; Jorge, A. M.; Soares, C. Using Association Rules for Monitoring Meta-Data Quality in Web Portals. In: SBBD/SBES´06 Segundo Workshop em Algoritmos e Aplicações de Mineração de Dados (WAAMD-2006), p. 105-108, Florianópolis, Santa Catarina, 2007
Marcos Domingues, Alípio Jorge, Carlos Soares, José Paulo Leal, Pedro Machado, A Data Warehouse for Web Intelligence, BI 07, EPIA 07 Workshop on Business Intelligence, IEEE, 2007
2008
Marcos Domingues, Jose Paulo Leal, Alipio Jorge, Carlos Soares, Pedro Machado, A Platform to Support Web Site Adaptation & Monitoring of its Effects: A Case Study, The 6th Workshop on Intelligent Techniques for Web Personalization & Recommender Systems, Held in conjunction with The 23nd National Conference on Artificial Intelligence - AAAI 2008 July 13-17, 2008 - Chicago, Illinois, USA
Reports
Domingues, M. A.; Jorge, A. M.; Soares, C. Dicionário de Dados do Data Warehouse. 8 pages. Porto, Portugal, 2007.
Definition of roles User: the person who browses the web or enters a web site. Editor: the person in charge of creating, updating and deleting contents on a specific web site. The editor may be the content producer (author) or not. Owner: the person or organization that owns the site and manages the Editors’ activity. The site exists to achieve some objectives of the Owner. |
Moodle workspace (restricted)