You are here: Home Research Distributed ILP for Data Mining

Distributed ILP for Data Mining

.

Members:

PhD Researchers: Rui Camacho
Colaborators: Nuno Fonseca (EMBLE-EBI, Cambrige, UK), Luísa Pereira (IPATIMUP), Jorge Vieira (IBMC)
PhD students: Pedro Abreu, Carlos Adriano Gonçalves (U.Vigo)
.

Research Lines:

Parallelism to improve ILP systems:

We are concerned with the development of techniques to improve the performance of ILP systems. We adopt a distributed and parallel execution of ILP systems to improve: the speed of execution; the quality of solutions and enable the processing of large amounts of data in complex domains. The improvements will make ILP systems more adequate for (Relational) Data Mining applications where both the amount of data and number of relations is very large. Under this line of research we are also integrating stochastic methods into parallel execution of ILP systems to profit from the advantages of both approaches.

General purpose Data Mining platform:

We have made two implementatons of the developments made under this line of research.
We have developed a distributed computing platform for Data Mining. This platform, implemented in Java, is independent of the Data Analysis tool (can use an ILP system, a Decision Tree system, the R-project tool etc) and is also independent of the Operating System (runs on Linux and Windows). The platform runs on conventional Desktop Machines (PCs) and uses only idle computational resources. The main purpose of this line of research is to make analysis of very large amounts of data affordable and possible within any organisation. The platform has also an extra special feature for our ILP system (IndLog) that allows IndLog to generate, at run time, new tasks to be scheduled by the platform scheduler.
The second implementation (the APIS system) uses a technique where by the hypotesis space is partitioned into a set of subspaces and each subspace is searched independently.

Applications for parallel execution of ILP systems:

We are applying the distributed version of ILP to the following complex problems: The Protein Folding problem; Analysis of DNA sequences; A Structure-Activity Relationship problem with drugs to control hypertension; Protein Unfolding simulation data; Intrusion Detection System.

Publications:

Book chapters:

  • "Induction as a search procedure'', Stasinos Konstantopoulos, Rui Camacho, Vítor Costa, Nuno Fonseca, in the book "Artificial Intelligence for Advanced Problem Solving Techniques" edited by Dimitris Vrakas e Ioannis Vlahavas, Aristotle University of Thessaloniki, Grece, pp. 166-216, Chapter 7, 2008

Journals and International Conferences:

  • Célia Talma Gonçalves, Rui Camacho, Eugénio Oliveira, "Ranking MEDLINE Documents'', Journal of the Brazilian Computer Society, volume 20, number 13, 2014
  • Carmona, S, Loureiro, MJ, Santos, J, Oliveira, A, Camacho, R, Santos, AI, "Lung ventilation/perfusion scintigraphy in pulmonary capillary hemangiomatosis: A pattern to consider", Revista Espanola de Medicina Nuclear e Imagen Molecular, Vol. 32 (2) pp 4, 2013
  • Loureiro, T, Camacho, R, Vieira, J, Fonseca, NA, “Boosting the Detection of Transposable Elements Using Machine Learning”, 7th International Conference on Practical Applications of Computational Biology & Bioinformatics, PACBB 2013, 22-14 May, 2013, Salamanca, Spain, VOLUME: 222, PAGES: 7, PUBLISHED: 2013
  • Angelopoulos, N, Santos Costa, V, Azevedo, J, Wielemaker, J, Camacho, R, Wessels, L, “Integrative functional statistics in logic programming, Practical Aspects of Declarative Languages” - 15th International Symposium, PADL 2013, Rome, Italy, January 21-22, 2013, Vol. 7752 LNCS pp 16, 2013
  • Nabuco, M, Paiva, ACR, Camacho, R, Faria, JP, “Inferring UI patterns with Inductive Logic Programming”, 8th Iberian Conference on Information Systems and Technologies (CISTI), pp 4, 2013
  • Tiago Loureiro, Rui Camacho, Jorge Vieira and Nuno A. Fonseca, "Improving the performance of Transposable Elements detection tools'' Journal of Integrative Bioinformatics, 10(3):231-242, 2013
  • Rui Camacho, Rita Ferreira, Natacha Rosa, Vânia Guimarães, Nuno A. Fonseca, Vítor Santos Costa,  Miguel de Sousa, Alexandre Magalhães, "Predicting the secondary structure of proteins using Machine Learning algorithms'', International Journal of Data Mining and BioInformatics,  Vol6, N6, pp 571-584, 2012
  • Célia Talma Gonçalves, Rui Camacho, Eugénio Oliveira, "BioTextRetriever: a tool to retrieve relevant papers'', International Journal of Knowledge Discovery in Bioinformatics (IJKDB), Editor: Jason T. L. Wang, vol 2, N 3, pp 21-36,  July-September 2011, IGI Publishing
  • Rui Camacho,  Max Pereira,  Vítor Santos Costa, Nuno A. Fonseca, Carlos Adriano, Carlos J. V. Simões, Rui M. M. Brito, "A Relational Learning approach to Structure-Activity Relationships in Drug Design Toxicity studies'', Journal of Integrative Bioinformatics, 8(3), pp 182-201, September, 2011
  • Miguel M de Sousa, Cristian R Munteanu,  Alejandro Pazos, Nuno A Fonseca, Rui Camacho e Alexandre Lopes Magalhães, "Amino Acid Pair- and Triplet-wise Groupings in the Interior of Alpha-Helical Segments in Proteins'', Journal of Theoretical Biology, 271(1):136-144, February 2011.
  • Luísa Pereira, Rui Camacho, Nuno A. Fonseca, et al., "PopAffiliator: online calculator for individual affiliation to a major population group based on 17 autosomal STR genotype profile'', International Journal of Legal Medicine, 125:629-636, 2011Nuno A. Fonseca, Ashwin Srinivasan, Fernando Silva, and Rui Camacho, "Parallel ILP for Distributed-Memory Architectures", Machine Learning journal, Vol. 74, Number 3, pp. 257-279, March 2009
  • N. Fonseca, V. S. Costa, R. Rocha, R. Camacho, F. Silva,``Improving the Efficiency of ILP Systems'' journal of Software: Practice and Experience, Vol. 39, Issue 2, pp. 189-219, Fev. 2009
  • N. Fonseca, Rui Camacho, R. Rocha, V. S. Costa, ``Compile the hypothesis space: do it once, use it often'', Fundamenta Informaticae, Special Issue on Multi-Relational Data Mining 89(1):45-67, 2008.
  • Nuno Fonseca,  Rui Camacho  and Alexandre Magalhães, "A study on amino acid pairing at the N- and C-termini of helical segments in proteins'', PROTEINS: Structure, Function, and Bioinformatics, Volume 70, Issue 1, Date: January 2008, Pages: 188-196
  • Ashwin Srinivasan, David Page, Rui Camacho and Ross King, "Quantitative Pharmacophore Models with Inductive Logic Programming'', Machine Learning journal, Vol. 64, N. 1/2/3, pp 65-90, 2006.
  • Ruy Ramos, Rui Camacho e Pedro Souto, A commodity platform for Distributed Data Mining -- the HARVARD System, 6th Industrial Conference on Data Mining (ICDM 2006), Germany, July 14-15, 2006.
  • Nuno Fonseca, Fernando Silva, Rui Camacho, April - An Inductive Logic Programming System 10th European Conference on Logics in Artificial Intelligence (JELIA'06) Springer-Verlag, LNCS 4160, pp 481-484, 2006.
  • Fonseca, Nuno, Fernando Silva, Vitor Costa, Rui Camacho, Strategies to parallelise ILP systems, 15th International Conference on Inductive Logic Programming (ILP 2005), Bonn, Germany, August 2005 Springer-Verlag, LNAI 3625, pp 136-153, (best paper ILP 2005).
  • Camacho, Rui, Nuno A. Fonseca, Alexandre Magalhães, Applying Inductive Logic Programming in a Study of Protein alpha-helices Structures BKDB2005 - Bioinformatics: Knowledge Discovery in Biology, Francisco M. Couto, Mario J. Silva e Pedro Fernandes (eds.), 17, June 2005, Lisboa, Portugal, pp 63-67.
  • Fonseca, Nuno, Fernando Silva, Vitor Costa, Rui Camacho, A pipelined data-parallel algorithm for ILP, Cluster 2005, Boston, USA, September 2005.
  • Rui Camacho, "IndLog --Induction in Logic'', JELIA 2004 - 9th European Conference on Logics in Artificial Intelligence, editors Jose Alferes e Joao Leite, Springer-Verlag, LNAI 3229, pp 718-721, 27-30 Setembro, Lisboa, Portugal, 2004.
  • Rui Camacho, "From sequential to Parallel Inductive Logic Programming'' in the 6th International Meeting on high performance computing for computational science (VECPAR) 2004, vol. 3, pp 973-978, Valencia, Spain, June 2004.

Doctoral thesis completed:

  • Célia Talma Gonçalves, A Tool for Text Mining in Molecular Biology Domains, 2013, Supervisors Eugénio Oliveira and Rui Camacho
  • Ruy Cesar Ramos, Extracção Automática de Conhecimento Utilizando Computação Distribuida em Sistemas de Indução de Programas em Lógica, 2012, Supervisor: Rui Camacho
  • Nuno Fonseca, Parallelism in Inductive Logic Programing systems, 2006, Supervisors: Fernando Silva and Rui Camacho

Pos-Doctotal position:

  • Applying ILP to gene sequence analysis
    Researcher: Nuno Fonseca, Supervisor: Jorge Vieira, Rui Camacho and Fernando Silva

Projects:

  • SIBILA (Towards Smart Interacting Blocks that Improve Learned Advice { NORTE-07-0124-FEDER-000059) initiated in 2013.
  • Grid Computing project CYTED GRID (involves 12 different countries and 15 research groups),  2006-2009
  • ILP-Web-Service: An Inductive Logic Programming based Web service, (FCT Project PTDC/EIA/70841/2006), 2008-2010

Editorial activity:

  • Rui Camacho: Guest editor (with Ashwin Srinivasan and Ross King) of the Machine Learning journal, Vol. 64, N. 1/2/3, pp 65-90, 2006 (special issue on ILP).