January 26, 15h, Allan Tucker

The next seminar of the 2016/2017 LIAAD Seminar Series will take place Thursday 26 of January, starting at 15:00 hours, in INESC Auditorium B. This session includes one lecture: Three Algorithms Inspired by Data from the Life Sciences, , presented by Allan Tucker.

****************************************

Thursday 26 of January,

15:00 Hours, INESC Auditorium B

****************************************

Presenter:

Allan Tucker

Title:

Three Algorithms Inspired by Data from the Life Sciences

Abstract:

In this talk I will discuss how the analysis of real-world data from health and the environment can shape novel algorithms. Firstly, I will discuss some of our work on modelling clinical data.

In particular I will discuss the collection of longitudinal data and how this creates challenges for diagnosis and the modelling of disease progression. I will then discuss how cross-sectional studies offer additional useful information that can be used to model disease diversity within a population but lack valuable temporal information. Finally, I will discuss the importance of inferring models that generalise well to new independent data and how this can sometimes lead to new challenges, where the same variables can represent subtly different phenomena. Some examples in ecology and genomics will be described.

Short-bio:

Tucker's first degree was in Cognitive Science at Sheffield University, UK, where he became interested in models of brain function and human and animal behaviour. His other interests include learning models of time-series data in order to try and understand the underlying processes, with a focus on biological, clinical and ecological data. He received his Ph.D at Birkbeck College, University of London sponsored by the Engineering and Physical Sciences Research Council; Honeywell Hi-Spec Solutions, UK; and Honeywell HTC, USA. As a Senior Lecturer at Brunel University London he leads the Intelligent Data Analytics Research group. He has worked in conjunction with Leiden University Medical School and University College London on gene regulatory networks. His current projects include modelling high dimensional gene expression data with Rothamsted Research; modelling visual field test data from Moorfield's Eye Hospital, London; text mining flora with the Royal Botanical Gardens, Kew London; an epidemiological big data analytics project in Kazakhstan funded by the British Council; and exploring the dynamics of fish populations in the Northern Atlantic in conjunction with the Canadian Department of Fisheries and Oceans and DEFRA.

Past seminars

****************************************

Monday 26 of September,

11:00 Hours, INESC Auditorium A

****************************************

Presenter:

Maria do Carmo da Rocha Sousa

Title:

Towards a Dynamic Model for Credit Risk

Abstract:

This thesis addresses the problem of credit risk assessment under changing conditions in financial environments. There are a number of serious challenges in this area that have not received sufficient attention so far. This includes adapting the rating systems to population drifts and stress-testing decision systems under extreme adverse conditions.

Research evolves from a baseline methodology to new dynamic modelling frameworks, and is settled in two interconnected research domains: the first assumes that the rating systems’ frameworks should adapt to the changing conditions; the second deals with the influence of extreme circumstances in credit default and in the banking business. As part of our contributions, we propose new methodological frameworks for risk assessment, and we present renewed empirical measures for credit risk. Empirical studies use real-world financial databases, of which the most relevant is the Freddie Mac’s loan-level dataset, available online since 2013 in the follow-up to the subprime crisis in the United States (U.S.).

In the first research domain we propose an adaptive modelling framework to investigate the two mechanisms of memory, short-term (STM) and long-term memory (LTM), in credit risk assessment. These components are fundamental to learning, but have been overlooked in credit risk modelling frameworks. We argue that different memory windows can be explored concurrently. This is important in the credit risk area, which often undergo shocks. During a shock, limited memory is important; at other times, a larger memory may be preferred.

In the second domain, we have developed a stress-testing methodology under the international rules on capital requirements, the Basel Accords. We present the first study using Freddie Mac database, which describes and implements a risk-adjusted equity model to fund the loans lending. In this context, we analyze the impact of the probability of default (PD) and loss given default (LGD) in the return on lending, when subject to the most extreme adverse circumstances of the past. We contribute with a more realistic understanding of the behavior of the reference static models when exposed to major disturbances in the financial systems.

Short-bio:

Maria Rocha Sousa was born in Penafiel, Portugal, in 1980. She received her BSc in Mathematics, in 2003, and her MSc degree in Mathematical Engineering, in 2007, both from the Faculty of Sciences of the University of Porto. She will defend her PhD thesis in Finance, entitled Towards a Dynamic Model for Credit Risk, by September 28th, 2016, at the Faculty of Economics of the University of Porto.

After several years working as technical coordinator of a credit risk analytics team, she recently joined a new Models Validation and Monitoring Office of a major Portuguese bank, reporting to the Board and to Regulators, in the areas of credit risk, market risk and capital requirements computation.

Part of her research is published in the Journal of Economics, Business and Management, in the Journal of Expert Systems with Applications, and in the Journal of Risk Model Validation. Her main interests are Credit and Market Risk Modeling, Stress-testing, Optimization and Pricing Modeling, in the context of International Financial Regulation.

****************************************

Friday 15 of July,

14:00 Hours, INESC Auditorium B

****************************************

Presenter:

Julie Soulas

Title:

Activity monitoring through the mining of home automation sensor logs

Abstract:

The aging of the population in the coming decades raises new challenges in order to help elderly people live longer at home, independently and safely. The emergence of assistive technologies, and in particular home automation devices, sensor networks and communication devices open up new opportunities to ease the interactions between the elderly and their environment, and to monitor their health status remotely day after day. In particular, the home automation sensors record information on the activity in the home and make it possible to assess autonomy and well-being.

This presentation presents the core contributions of my PhD work. We focus on the mining of the data recorded by such sensor networks. We focus more particularly on the discovery of habits. Indeed, the daily routines help maintain autonomy. We have thus proposed unsupervised data mining algorithms for the discovery and description of periodic behaviors. A first contribution, the extended Episode Discovery algorithm (xED), allows the discovery of such habits in transactional data, and the characterization of their variability. xED was further developed for the handling of event streams and the update of the habits when time passes. A top-$k$ approach was also proposed for the discovery and update of regular patterns in event streams.

These contributions open the way towards innovative methods for the monitoring of isolated individuals, in particular thanks to the personalized adaptation of the home, anomaly detection, or the analysis of the evolution of habits and heath status.

Bio:

Julie Soulas received her PhD in Computer Science from Telecom Bretagne (department LUSSI, Lab-STICC laboratory), France. Her research interests include data mining, and sensor networks, with a focus on unsupervised pattern discovery and monitoring. She joined LIAAD in June 2016 as a postdoctoral fellow, where her work focuses on event prediction from medical data.

*********************

Presenter:

Arian Pasquali

Title:

Assessing topic discovery evaluation measures on Facebook publications of political activists in Brazil

Abstract:

Automatic topic detection in document collections is an important tool for various tasks. In particular, it is valuable for studying and understanding socio-political phenomena. A currently relevant example is the automatic analysis of streams of posts issued by different activist groups in the current Brazilian turmoil, through the analysis of the generated streams of texts posted on the web. To achieve this, it is useful to determine the relative importance of the different topics identified. We can find in the literature proposals for measuring topic relevance. In this paper, we adopt two of such measures and apply them on data sets extracted from Facebook pages related to Brazilian political activism. On top of the analysis, we then carry an experimental evaluation of the human interpretability for these two measures by comparing their outcomes with the opinion of three Brazilian professionals from the field of Communication Science and media-activists.

Short-bio:

Arian Pasquali is a Computer Science student at the Faculty of Science of the University of Porto and a researcher at LIAAD, Inesctec Porto.

He is currently working on automatic evaluation methods for topic modeling.

Research: Web mining, information retrieval, topic modeling, topic modeling evaluation

****************************************

Friday 17 of June,

14:00 Hours, INESC Room A2.1

****************************************

Presenter:

Alfredo Cuzzocrea

Title:

Advanced Probabilistic Data Models for Uncertain Big Data Management

Abstract:

Uncertain Big Data arise in a plethora of modern application scenarios, with particular regards to Cloud computing environments. Uncertainty of big data come from several reasons, among which we mention: transmission errors, storage errors, human errors, intrinsic data incompleteness, etc. In this talk, state-of-the-art advanced probabilistic data models for uncertain big data management are introduced and critically analyzed, by highlighting their benefits and limitations. Several specific case studies will be considered, such as relational data, graph data, multidimensional data, and so forth. Performance associated to the various techniques will also be explored. A special applicative case study will focus on a novel framework for supporting OLAP analysis over uncertain and imprecise data streams. Finally, several real-life applications showing how uncertain data are exploited in practice, such as social networks, will be described, by emphasizing solutions devoted to overcoming uncertainty (and imprecision) in data.

Short bio:

Alfredo Cuzzocrea is Associate Professor at the DIA Department, University of Trieste, Italy. He is also habilitated as Full Professor in Computer Engineering by the French National Scientific Habilitation of the National Council of Universities (CUN) under the hegira of Ministry of Higher Education and Research (MESR). He is also Research Associate at the Institute of High Performance Computing and Networking of the Italian National Research Council, Italy. He holds 50 Visiting Professor positions worldwide (Europe, USA, Asia, Australia). He serves as Springer Fellow Editor. He serves as Elsevier Ambassador. He holds several roles in international scientific societies, steering committees for international conferences, and international panels, some of them having directional responsibility. He served as Panel Leader and Moderator in international conferences. He served as Invited Speaker in several international conferences worldwide (Europe, USA, Asia). He is member of scientific boards of several PhD programs worldwide (Europe, Asia, Australia). He served as Editor for the Springer series “Communications in Computer and Information Science”. He covers a large number of roles in international journals, such as Editor-In-Chief, Associate Editor, Special Issue Editor (including JCSS, IS, KAIS, FGCS, DKE, INS, BigData Research). He edited more than 30 international books and conference proceedings. He is member of editorial advisory boards of several international books. He covers a large number of roles in international conferences, such as General Chair, Program Chair, Workshop Chair, Local Chair, Liaison Chair and Publicity Chair (including CSE, ODBASE, DaWaK, DOLAP, ICA3PP, ICEIS, APWeb, SSTDM, IDEAS, IDEAL). He served as Session Chair in a large number of international conferences (including EDBT, CIKM, DaWaK, DOLAP, ADBIS). He serves as Review Board Member in a large number of international journals (including TODS, TKDE, TKDD, TOIS, TSC, TIST, TSMC, THMS, DAMI, JCSS, IS, KAIS, FGCS, DKE, INS). He serves as Review Board Member in a large number of international books. He serves as Program Committee Member in a very large number of international conferences (including VLDB, ICDE, EDBT, CIKM, IJCAI, KDD, ICDM, PKDD, SDM). His current research interests include multidimensional data modelling and querying, data stream modelling and querying, data warehousing and OLAP, OLAM, XML data management, Web information systems modelling and engineering, knowledge representation and management models and techniques, Grid and P2P computing, privacy and security of very large databases and OLAP data cubes, models and algorithms for managing uncertain and imprecise information and knowledge, models and algorithms for managing complex data on the Web, models and algorithms for high-performance distributed computing and architectures. He is author or co-author of more than 350 papers in international conferences (including EDBT, CIKM, SSDBM, MDM, DaWaK, DOLAP), international journals (including JCSS, IS, KAIS, DKE, INS) and international books (mostly edited by Springer). He is also involved in several national and international research projects, where he also covers responsibility roles.

****************************************

Thursday 2 of June,

14:00 Hours, INESC Room A2.1

****************************************

Presenter: Luboš Popelínský

Title: Class-based outliers

Abstract:

We focus on the area of outlier detection in labeled data where the class label is principal for outlier detection.

Class-based outliers are those cases that look anomalous when the class labels are taken into account, but they do not have to be anomalous when the class labels are ignored. We introduce a novel method for outlier detection in labelled data based on Random Forests and compare it with existing methods both on artificial and real-world data. We show that it is competitive with the existing methods and sometimes gives more intuitive results.

We also describe two methods for class-based outlier description and interpretation.

Short -Bio:

Luboš Popelínský

Associate professor at Faculty of Informatics, Masaryk University Brno

Head of Knowledge Discovery Lab at FI MU

Teaching: Machine learning, data and text mining, logic

Research: Outlier detection, applications of machine learning, inductive

logic programming

****************************************

Thursday, 17 of March,

14:00 Hours, INESC Auditorium A

****************************************

Presenter:

Dr. Rafal Leszczyna

Title:

Mobile Agents and Cybersecurity

Abstract:

During the seminar the application of mobile agents to the protection of critical infrastructures will be presented as well as agent-based architecture for users' anonymity.

Short Bio:

Dr. Rafal Leszczyna is an assistant professor at Gdansk University of Technology, Faculty of Management and Economics. He holds the M.Sc. degrees of Computer Science and Business Management. In December, 2006 he earned a Ph.D. in Computer Science, specialisation - Computer Security. Between 2004 and 2008 he worked in the European Commission Joint Research Centre, in the teams dealing with information security and the security of critical networked infrastructures. After his return to the university in 2008, from 2010 to 2012 he was seconded to the European Network and Information Security Agency (ENISA), where among the others he was responsible for coordinating the studies related to the security of industrial control systems and smart grids. His professional interests focus on the security of information systems, information security of critical infrastructures, and the issues relevant to information security management.

**********************************

Presenter:

Sónia Teixeira

Title:

Flocking Behavior and Strategic Alliances: an agent-based approach.

Abstract:

The evolution of markets provides a change in the way organizations act. In order to improve their competitive performance and stay on the market, organizations often adopt a strategy to establish agreements with other organizations, known as strategic alliances. Several tools, algorithms and computational systems call upon other sicences as a source of inspiration. Throughout this work we will explore flocking behavior, a paradigm of biology to analyze the collective intelligence behavior that emerges from a group (flocking). Inspired by the Cucker and Smale algorithm (C-S), we propose a new version of the flocking algorithm applied to strategic alliances, taking into account a learning mechanism. For this new approach, metrics were obtained for the parameters of the C-S algorithm, namely position, velocity and influence, where the latter uses cooperative games, adapted mechanisms and methods currently explored in reinforcement learning. We have used Netlogo as the modeling environment to perform the simulations. Five parameter configurations were analysed. For each of those configurations the average number of iterations, the permanence rate of organizations in the alliance and the average growth of the organizations were computed. The behavior of the organizations reveals a tendency for convergence, confirming the existence of a flocking behavior.

Short Bio

Sónia Teixeira hold's a Master's degree in Modelling, Data Analysis and Decision Support Systems and a Bachelor's degree in Mathematics, both from the University of Porto. She is a researcher at LIAAD and her fields of interest are: Collective Intelligence, Multiagent Systems and Game Theory.

****************************************

Tuesday 23 of February,

14:00 Hours, INESC Auditorium B

****************************************

Presenter:

Eirini Ntoutsi

Title:

Mining high dimensional data streams

Abstract:

High dimensional data streams are collected in many scientific projects, humanity research, business process, social media and the Web. The challenges of data stream mining are aggravated in high dimensional data, since we have to decide with one single look at the data also about the dimensions that are relevant for the data mining models. We will discuss i) unsupervised learning over high dimensional numerical data streams and ii) supervised learning over opinionated text streams. The challenge in task (i) is to extract and maintain online both cluster members and dimensions upon which these clusters are defined. We will discuss here our subspace stream clustering algorithm HDDStream. With respect to task (ii) the challenge is to built a classifier that copes with both evolving data and evolving feature spaces. We will discuss here an approach for ageing-based MNB classifiers that tunes the ageing parameter online based on the underlying stream.

Bio:

Eirini Ntoutsi is currently a postdoc researcher at the Ludwig-Maximilians University of Munich (LMU), Germany. As of March 2016, she will join the Faculty of Computer Engineering and Computer Science at the Leibniz University of Hanover, Germany. She received her PhD in Data Mining at the University of Piraeus, Athens, Greece. Her research interests lie in data mining and machine learning. Her research focuses on pattern extraction, change detection, evolution monitoring, stability analysis and pattern management over dynamic data and data streams.

Website: http://www.dbs.ifi.lmu.de/cms/Eirini_Ntoutsi

**********************************

Presenter:

Fabíola Pereira

Title:

User Preferences Dynamics on Social Networks

Abstract:

User preferences are fairly dynamic, since users tend to exploit a wide range of items and modify their tastes accordingly over time. Moreover, all the time users are facing with others opinions and suffering social influence. We hypothesize that changes on user preferences are correlated with changes on his/her social network structure. So, combining social network analysis and user preferences topics, considering time dimension, can allow the discovery of dynamic patterns in users behaviors.

The purpose of the presentation is to show our findings on this way. First, we introduce some concepts for reasoning with temporal preferences, like preference order and preference change. Then we concentrate on social networks dynamics. Specially, we consider using temporal networks concepts, which implies on revisiting definitions like, for example, shortest paths and all derived measures. In the end, we propose a baseline approach for change detection over centrality measures. Twitter data has been used to validate our proposals.

Short bio:

Fabíola Pereira has a degree and a Master degree in Computer Science both from Federal University of Uberlândia, Brazil. During her master years, she had the opportunity to work on distinct areas such as Data Mining, Database Foundations and User Preferences. She also had the opportunity to work for a semester at University of São Paulo as a visiting researcher. Between 2011 and 2014 Fabíola worked as software engineering at Algar Telecom. Currently, she is a third year PhD student at Federal University of Uberlândia, Brazil, and a LIAAD visiting researcher. Her main research interests are in Temporal Networks, User Preferences and Social Networks Analysis.

*************************************

Presenter:

Shazia Tabassum

Title: Analyzing Evolutionary Graph Streams

Abstract:

Large volumes of data is being generated from real time application domains such as web, social networks, email networks, biological networks, sensor networks, telecommunication networks etc.The streams of networked data generating from these sources, provide a powerful abstraction of social interactions between individuals, representing real world social structures. These networked streams are temporal, unbounded and generated by non-stationary distributions in dynamic environments. Evolution analysis of such social graph streams has applications to a number of different scenarios like trend analysis, influential analysis, event detection etc. This talk would focus on analyzing social streams using sociometrics. We would also address the challenges associated with the above properties of social streams by proposing some sampling methods and present experimental results.

Short bio:

Shazia Tabassum is a researcher at LIAAD, Inesctec Porto and pursuing her Ph.D in Informatics Engineering at University of Porto under the supervision of Prof. Joao Gama. She was born in India. She holds the degree of Bachelor of Computer Applications and Master of Computer Applications from Kakatiya University (KU) in India. Her research focus is on Machine Learning, Networked Data Streams, Evolutionary Social Graphs and Social Network Analysis.

***************************************

Wednesday 3 of February,

14:00 Hours, INESC Auditorium A

****************************************

Presenter:

Pedro Quelhas Brito

Title: DAROI (data analysis return on investment): Learning about the effectiveness of the marketing investment

Abstract:

Data analysts focus on extracting knowledge from the data. Business people strive to improve profitability. They do not care how that knowledge is obtained only want to know how it can be useful to reach their goals.

Considering this gap, a researcher specialised in consumer psychology (marketing) try to bridge these two worlds. We translate managers vantage point to data miners. Then we try make sense, to depict the relevant meaning of deep analysis undertaken by our data analytic friends.

I'll share what I've done in that area. The purpose of my presentation is show what is marketing effectiveness and how it can be measured.

Short bio

Pedro Quelhas Brito is an Assistant Professor with Habilitation at University of Porto, School of Economics. Ph.D. at UMIST, United Kingdom; M.A., University of Porto

His research focus is on consumer psychology applied to tourism, retailing and new media.

Dr. Brito has investigated the pre-adolescents integration of digital instruments in the context of their social network management. He is author, co-author and editor of several books and chapters as well as articles published in journals devoted to marketing, consumer psychology and communication. He is director of several executive master and post-graduation programs in business management. He has been responsible for more than 50 market research projects to several local and multinational companies and public institutions both in EU, Africa and Brazil.

*********************************

Presenter:

Vincent Becker

Title: Concept Change Detection on Correlated Subspaces in Data Streams

Data analysis has become a very important field, in both research and application. Data is continuously collected at high rates and with numerous attributes. This poses two challenges: Firstly, the data cannot be fully stored and subsequently analysed because of its sheer amount and because such a procedure does not fit the streaming character of many real-word applications. Secondly, the data is often high dimensional, which causes not only a decrease in performance in quality, but also an increase in computational cost for many data analysis methods, which may rise exponentially with the dimensionality. For this reason it is desirable to limit the number of attributes to those which are useful for the analysis task. Hence, we are searching for subspaces which contain relevant information. We propose an approach to find correlated subspaces, since correlation means there is common information for attributes, which cannot be obtained by only analysing the attributes on their own. Based on an existing method for static datasets, we devise a method to find correlated subspaces in data streams. It works by comparing marginal and conditional samples drawn from a stream summarization structure. Our method can be used in combination with any kind of stream analysis or as an analysis method by itself.

A very important issue in stream mining is concept change, meaning that the probability distribution generating the data stream is non-stationary. The detection of concept changes is a crucial task, since models learnt on outdated data are not valid anymore.

As an extension and to evaluate the method, we let a concept change detection algorithm run on the subspaces found by our algorithm.

Short Bio:

Vincent Becker is a Computer Science student at the Karlsruhe Institute of Technology (KIT) and he is currently doing my master's thesis at LIAAD. Before he received a Bachelor's degree at KIT, also in Computer Science.

****************************************

Wednesday 11 of November,
18:00 Hours, INESC Auditorium A
****************************************

Presenter: Paula Branco

Title:Utility-based Predictive Analytics with UBL Package.

Abstract:
Many real world applications encompass domain specific information which, if disregarded, may strongly penalize the performance of predictive models. In the context of finance, medicine, ecology, among many other, specific domain information concerning the preference bias of the users must be taken into account to enhance the models predictive performance. In this seminar we will address the problem of utility-based learning.We will show the main challenges of this type of problems and a broad taxonomy for the existing solutions. We will introduce the R package UBL that implements several approaches for tackling utility-based problems considering both classification and regression tasks.Finally, we will provide some simple examples of solutions implemented in UBL package showing how they can be used.

Short Bio:
Paula Branco has a degree in Mathematics and a Master degree in Computer Science both from the University of Porto. Currently she is a PhD student in MAPi Doctoral Programme. She is working in
"Utility-based Predictive Analytics" under the supervision of Prof.Luís Torgo and Prof. Rita Ribeiro.
Her main research interests are in Machine Learning, Data Mining and, in particular, in imbalanced distributions, outliers detection, rare extreme values forecasting and performance assessment.

****************************************

Friday 23 of October,

14:00 Hours, INESC Auditorium A

*****************************************

Presenter: Arkady Zaslavsky

Title: Big Data meets Internet of Things

Abstract:

The Internet of Things (IoT) is one of the major disruptive technologies and is on top of Gartner’s hype curve for 2014/2015. IoT will connect billions of "things", where things include computers, smartphones, sensors, objects from everyday life. IoT will be the main source of big data according to predictions of many experts. This talk focuses on challenges of the IoT and disruptively big data it generates. The talk will also showcase a CSIRO IoT technology which brings together sensing and cloud computing and is an efficient open platform for handling IoT data streams of high volume, velocity, value and variety. A case study built on the basis of the OpenIoT platform will also be presented.

Bio:

Dr Arkady Zaslavsky is a Senior Principal Research Scientist in Data61 @ CSIRO. He is leading the scientific area of IoT at Data61 and leads a number of projects and initiatives. Before coming to CSIRO in July 2011, he held a position of a Chaired Professor in Pervasive and Mobile Computing at Luleå University of Technology, Sweden where he was involved in a number of European research projects, collaborative projects with Ericsson Research, PhD supervision and postgraduate education. He currently holds the titles of a Research Professor at LTU (Sweden), Adjunct-Professor at UNSW (Sydney), Adjunct Professor at La Trobe University (Melbourne), Visiting Professor at StPetersburg University of ITMO. Between 1992 and 2008 Arkady was a full-time academic staff member at Monash University, Australia, where he held various academic and administrative positions including Director of Monash Research Centre for Distributed Systems and Software Engineering, Director of Monash CoolCampus initiative that brought together pervasive computing researchers with university users. He was a Principal Investigator at CRC DSTC Distributed Systems Technology Centre leading and contributing to the DSTC project “M3: Enterprise Architecture for Mobile computations”. He led and was involved in a number of research projects funded by ARC (ARC Discovery and Linkage) and industry, both at national and international level, totalling in support more than AU$ 12,000,000. He chaired and organised many international workshops and conferences, including Mobile Data Management, Pervasive Services, Mobile and Ubiquitous Multimedia and others. Arkady made internationally recognised contribution in the area of disconnected transaction management and replication in mobile computing environments, context-awareness as well as in mobile agents. He made significant internationally recognised contributions in the areas of data stream mining on mobile devices, adaptive mobile computing systems, ad-hoc mobile networks, efficiency and reliability of mobile computing systems, mobile agents and mobile file systems. Arkady received MSc in Applied Mathematics majoring in Computer Science from Tbilisi State University (Georgia, USSR) in 1976 and PhD in Computer Science from the Moscow Institute for Control Sciences (IPU-IAT), USSR Academy of Sciences in 1987. Before coming to Australia in 1991, Arkady worked in various research positions at industrial R&D labs as well as at the Institute for Computational Mathematics of Georgian Academy of Sciences where he lead a systems software research laboratory. Arkady Zaslavsky has published more than 400 research publications throughout his professional career and supervised to completion more than 35 PhD students. Dr Zaslavsky is a Senior Member of ACM and a Senior Member of IEEE Computer and Communications Societies.

****************************************

Tuesday 9 of June,

15:00 Hours, INESC Auditorium B

*****************************************

Presenter: Prof. Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg, Germany

Title: Learning from Very Volatile Streams of Opinionated Documents

Abstract:

When we observe a stream of opinionated documents, we are usually interested in the topics of these documents and in their polarity. The learning process is subject to two challenges. On the one hand, not only the topics evolve; the vocabulary may also evolve, including the connotation of words (positive or negative). On the other hand, the documents might be labeled but this does not imply that all of the document’s content agree with the label. Having an expert label the individual words of each document is impractical, of course. In this talk, I present opinion stream mining methods that deal with these challenges. The results to be presented are joint work with Max Zimmermann, Eirini Ntoutsi and (for ECML PKDD 2015) Sebastian Wagner. They have been published in:

* M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Adaptive semi-supervised opinion classifier with forgetting mechanism. SAC 2014.

* M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Discovering and monitoring product features and the opinions on them with opinstream. Elsevier Journal of Neurocomputing, 2015

* M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Extracting opinionated (sub)features from a stream of product reviews using accumulated novelty and internal re-organization. Elsevier Journal of Inf. Sciences - Special Issue on Discovery Science 2013 (accepted in June 2015)

* S. Wagner, E. Ntoutsi, M. Zimmermann, and M. Spiliopoulou. Ageing-based Multinomial Naive Bayes Classifiers over Opinionated Data Streams. ECML PKDD 2015 (id: 507)

SHORT-BIO:

Myra Spiliopoulou is Professor of Business Information Systems at the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany. Her main research is on mining dynamic complex data. Her publications are on mining complex streams, mining evolving objects, adapting models to drift and building models that capture drift. She focusses on two application areas: business (including opinion stream mining and adaptive recommenders) and medical research (including epidemiological mining and learning from clinical studies). She served as PC Co-Chair of ECML PKDD 2006, NLDB 2008 and of 36th Annual Conference of the German Classification Society (GfKl 2012, Hildesheim, August 2012) She is involved in the organization committees of several conferences; she was Tutorials Co-Chair at ICDM 2010 and Workshops Co-Chair at ICDM 2011, Demo Track Co-Chair of ECML PKDD 2014 and 2015, and is senior PC member of recent conferences like ECML PKDD 2014, 2015 and SIAM Data Mining 2015. She has held tutorials on topics of data mining at KDD 2009, PAKDD 2013 and in several ECML PKDD conferences. At KDD 2015, she will present a tutorial on “Medical Mining” together with Pedro Pereira Rodrigues and Ernestina Menasalvas.

URL: http://www.kmd.ovgu.de/Team/Academic+Staff/Myra+Spiliopoulou.html

****************************************

22 of May, Friday,

15:30 Hours, INESC Auditorium A

*****************************************

Presenter: Luís Matias

Title: On Improving Operational Planning and Control in Public Transportation Networks Using Streaming Data: A Machine Learning Approach

Abstract: In the last decades, public road transportation (PT) companies played a central role in highly populated urban areas, especially by providing fast short distance transportation services. GPS provided an unprecedented opportunity to develop large mobility networks. It constitutes an unbounded stream of data that arrives at a high rate.

In the last decades, Machine Learning (ML) research has been essentially focused on batch learning usually using relatively small datasets. These learning methods are unable to change their models after the training stage (i.e. in real-time). However, the vehicular network's nature is, by definition, highly dynamic and dependent of multiple stochastic factors. Consequently, online learning methods are necessary to adequately handle such complex task.

The aim of this thesis is to monitor the operations of PT networks to infer useful information about their future status on both short-term and long-term horizons. This work includes a survey on data driven methods to improve both PT planning and control, which identifies concrete opportunities where such data can provide a key contribution. Hereby, we introduce three novel machine learning algorithms which address three concrete real-world problems: (1) Automatic Evaluation of the Schedule Plan's Coverage; (2) Real-Time Mitigation of Bus Bunching occurrences; (3) Real-Time Smart Recommendations about the most adequate stand to head to in each moment according to the current network status. These formulations bring together descriptive statistics, rule induction theory, probabilistic reasoning, mathematical programming, clustering methods and time series analysis techniques to provide feasible solutions for these problems. We validate such contributions using real world case studies from both taxi and bus fleets running in the city of Porto, Portugal. The obtained results demonstrate the methodologies' impact regarding the current state-of-the-art. Moreover, they enforce the idea of mining multiple heterogeneous sources of streaming data in a feasible way (from a computational point of view) as one of the most promising approaches to maintain sustainable levels of urban mobility in major cities worldwide.

*************************

Presenter: Douglas de Oliveira

Title: A Bounded Neural Network for Open Set Recognition

Abstract: It is not unusual the existence of learning systems developed on the top of closed-set assumptions, ignoring some error risk involved in a prediction: the more distant the training observations are to an unlabelled observation, higher is the risk regarding its classification. Through this work it was developed an approach to open set recognition based on an elaborate distance-like computation provided by a weightless neural network model.

****************************************

12th Seminar, 22 of April, Wednesday,

14:30 Hours, INESC Auditorium A

*****************************************

Presenter: Philippe Fournier-Viger

Title: CPT+: A Compact Model for Accurate Sequence Prediction

Abstract:

Sequence prediction is a fundamental problem in data mining, which consists of predicting the next symbol of a sequence of symbols. This problem has applications in many domains such as webpage prefetching and customer behavior prediction. Several sequence prediction models have been proposed such as DG, All-k-order markov and PPM. However, these predictions models suffer from two important limitations. First, they generally assume the Markovian assumption that the occurrence of a symbol in a sequence depends only on the preceding symbol. Second, these models are lossy models, i.e. the original training sequences cannot be recovered from these models.

To overcome these problems, we will present a new model named the Compact Prediction Tree (CPT+). CPT+ is built by losslessly compressing the training sequences. This ensure that all relevant information is available for each prediction. Furthermore, CPT+ relies on an indexing mechanism to allow fast sequence searching and matching, and are a more complex prediction algorithm, which integrates several optimizations.

Experimental results on seven real life datasets from various domains show that CPT+ has the best overall accuracy when compared to six state-of-the-art sequence prediction models from the literature: All-K-order Markov, CPT, DG, Lz78, PPM and TDAG.

Bio: Philippe Fournier-Viger (Ph.D.) is an assistant-professor at University of Moncton, Canada. He received a Ph.D. in Cognitive Computer Science at the University of Quebec in Montreal (2010). He has published more than 75 research papers in refereed international conferences, books and journals. His research interests are data mining, algorithm design, pattern mining, sequence mining, sequence prediction, text mining and e-learning. He is the founder of the popular SPMF open-source data mining library, specialized in pattern mining, which has been cited in more than 170 papers since 2010.

****************************************
11th Seminar, 31 of March, Tuesday,
11:00 Hours, INESC Auditorium A
*****************************************

Presenter: Ana Costa e Silva

Title:
Measure, Model, Deploy: analytics maturity at the hand of the end-user with TIBCO's suite

Abstract:
Lots of value can be created when an organisation has streamlined access to its data that allows it to identify issues and forces early on. The next step of analytics maturity consists of understanding the past via predictive modelling and operational optimisation. But the full value of analytics is only set free when organisations deploy those models in real time and start managing the present as it happens. Real-time long distance monitoring of equipment or whole factories, or of health readings from hospital patients become then possible. Or continuous transaction monitoring for fraud detection. Or tracking of customers activities on our website for maths supported recommendations that are helpful for once.
In this talk we will show how TIBCO's products can be woven to give the business end-user an easy-to-use interface to accomplish all those goals, on small or big data.

Bio:
For the last 15 years, Ana has been passionate about searching and finding gems in data, Maths and Stats have always been the connecting thread throughout her career. After initial studies in management and a Mestrado in data analysis (MADSAD) at FEP-Porto plus 7 years working in the Statistics department of the Portuguese Central Bank, Ana completed a PHD in computer science at Edinburgh University and then spent an extra 4 years researching the inner workings of the global stock market for Edinburgh Partners. Ana now works within TIBCO Spotfire's Industry Analytics Group. The group diligently surfaces sector specific use cases that can deliver to TIBCO's clients the full value of their data, via analytics and visualisation. Representing the Group in EMEA and Latin America, Ana has helped a number of organisations in their baby or sage steps into the world of Analytics and Big Data.

****************************************

10th Seminar, 25 of March, Wednesday,

12:00 Hours, INESC Auditorium A
*****************************************

Presenter: Inês Dutra (CRACS)

Title: Effective Classification of non-definitive ARS biopsies using First-Order Rules

Abstract:
Expert knowledge expressed in the form of first-order rules is used in order to improve the performance of machine learned Naıve Bayes models on a subgroup of non-definitive biopsies. Our results show that well tailored rules specific to a subgroup of non-definitive biopsies combined with Naıve Bayes models can improve routine practice by saving women from going to excision surgery while keeping 100% sensitivity.

****************************************

Presenter: Theofrastos Mantadelis

Title: MetaProbLog for Probabilistic Logic Programming and Learning

Abstract:
MetaProbLog is a framework of the ProbLog probabilistic logic programming language. ProbLog extents Prolog programs by annotating facts with probabilities. In that way it defines a probabilistic
distribution over all Prolog programs. ProbLog follows the distribution semantics presented by Sato. MetaProbLog extends the semantics of ProbLog by defining a "ProbLog engine" which permits the
definitions of probabilistic meta calls.
MetaProbLog uses state-of-the-art knowledge compilation methods, tabling, and several optimizations in order to provide efficient probabilistic inference. Further than supporting the semantics and features of ProbLog, MetaProbLog introduces semantics for probabilistic meta calls and has several unique features such as: datasets, memory management.

In this talk we will present some key differences among the 3 existing ProbLog systems and present several motivating applications like: probabilistic graph mining, parameter learning, probabilistic
structure learning.

Bio:
Theofrastos Mantadelis is a post doc researcher at the Computer Science department of University of Porto. His PhD was in KU Leuven, focusing on the efficiency of ProbLog where he was the key implementer of the two ProbLog systems. His research interest lie mostly in logic programming and probabilistic logic programming.

****************************************

9th Seminar, 5 of March, Thursday,

14:30 Hours, INESC Auditorium A

*****************************************

Presenter: Hendrik Blockeel

Title: Probabilistic-logical modeling

Abstract: Probabilistic logical modeling is about combining first order logic and probability theory in the context of knowledge representation and inference. It is well-known that first-order predicate logic is much more expressive than propositional logic, and efficient methods for reasoning in first-order predicate logic have been developed. The theory of probabilities has also been studied extensively, and has led to advanced reasoning methods. However, probability theory essentially works in a propositional framework. It is obvious that extending it towards first-order logic holds tremendous potential. This has turned out to be a challenging task, which has attracted a lot of attention for two decades now, and has been approached from many different angles. Much progress has been made, yet there are still many open questions.

This talk is meant as an introduction to probabilistic-logical modeling, describing the field in general (without being exhaustive) and focusing in some more depth on a few specific topics that the Leuven lab has worked on, including the ProbLog system and some work on lifted inference.

Bio:

Hendrik Blockeel is a professor at the Computer Science department of KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.

***************************************

8th Seminar, 24 of February, 2015

14:30 Hours, INESC Auditorium B

*****************************************

Presenters: Luis Trigo, Rui Sarmento e Pavel Brazdil

Title: Affinity Miner applied to Researchers' Publications via Network Analysis and Keywords

Abstract:

A case study and demo oriented towards 5 INESC Tec centers (LIAAD, CRACS, CESE, CTM, CEGI), accompanied by brief description of the methods implemented.

Finding people with similar skills within a domain may provide an important support for managing research centers. As academic production is easily accessible in academic and bibliographic databases which can be used to uncover the affinities among researchers that are not yet evidenced by co-authorship. This is achieved with the help of text mining techniques on the basis of theterms used in the respective documents. The affinities can be represented in the form of network where nodes represent researchers’ articles and links represent similarity. Each node can be characterized by various centrality measures. Community detection algorithm permits to identify groups with similar articles. Each node is characterized further by a set of automatically discovered keywords.

This presentation provides more details about the methods adopted and/or developed, some of which were implemented in our prototype. The methods presented are general and applicable to many diverse domains.These can include documents describing R&D projects, legal documents, court cases or medical procedures. We believe this work could thus be useful to a relatively wide audience.
We acknowledge the help of F.Silva and collaborators who maintain the Authenticus bibliographic database.

***************************************

7th Seminar,30th January,2015.

*****************************************

Presenter: Luis Torgo

Title: An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R

Abstract: This seminar will describe some of the main features of the R package performance Estimation. This package provides a general infra-structure to compare and evaluate any approach (workflow) to solve any predictive task (classification, regression or time series). The main goals of the package are: (i) to be very simple to use for "business as usual" comparative experiments (e.g. I want to compare algorithms X, Y and Z on these K data sets); and (ii) to be flexible enough to be possible to use it on very specific experimental settings (e.g. I want to try my new fancy approach that involves my special algorithm, some data-preprocessing steps and some special post-processing stages, on a set of tasks, and I want to evaluate it using some hand-crafted evaluation metrics). We will illustrate several of these usage cases as well as provide examples on how to explore the results of the experiments. Finally, we will also give examples of the implementation in the package of statistical significance tests along the lines described in the now famous JMLR paper by Demsar (2006), namely through the post-hoc Nemenyi and Bonferroni-Dunn tests and also the associated CD diagrams. If there is time we will also illustrate some features of the package that allow the use of parallel computation (either through multiple cores or computer clusters) for very large comparative experiments.

****************************************************************************************

Presenter: Bruno M. P. M. Oliveira

Title: Cournot duopolies with R&D investment on the production costs

Abstract: We analyse a duopolistic Cournot competition model, where both firms can invest to reduce their production costs. We study an R&D investment function inspired in the logistic equation and the d’Aspremont and Jacquemin function. We do a full characterization of the associated game with our R&D investment function. The parameter space can be divided in regions with one, two and three Nash investment equilibria. In particular, multiple Nash investment equilibria can be found for high production costs. Furthermore, we study the short and long term economical effects derived from myopic optimal discrete and continuous R&D dynamics. We observe that the long term economical effects can be very sensitive to small changes in the efficiency of the R&D programs.

***************************************

6th Seminar,19th December,2014.

*****************************************

Presenter: Pablo Gamallo

Title: An overview of Open Information Extraction and Linguakit

Abstract: The presentation consists of two different parts. First. I'll introduce Open Information Extraction, a recent unsupervised strategy to extract great amounts of basic propositions (verb based triples) from massive text corpora which scales to Web-size document collections. The different aspects of this extraction method will be described and evaluated. Second, I'll present Linguakit, a web portal integrating a package of linguistic and information extraction tools for Portuguese, English, Galician, and Spanish. It includes: Part of Speech tagging, syntactic analysis, term and multiword extraction, sentiment analysis, summarization, named entity recognition, and so on.

*******************************************

5th Seminar,16th December, 2014

*******************************************

Presenter: João Duarte

Title: Multi-Target Regression on Data Streams with Adaptive Model Rules

Abstract: The volume and velocity of data is increasing at astonishing rates. In order to extract knowledge from this huge amount of information there is a need for efficient on-line learning algorithms. This talk addresses the problem of multi-target regression from data streams. We propose to solve this problem using adaptive model rules since rule-based algorithms produce models that are easy to understand and can be used almost offhand. We will show ongoing work in this topic and some preliminary results.

******************************************************************************************

Presenter: Rafael Nunes

Title: Using Exit Time Predictions to Optimize Self Automated Parking Lots

Abstract: Private car commuting is heavily dependent on the subsidization that exists in the form of available free parking. However, the public funding policy of such free parking has been changing over the last years, with a substantial increase of meter-charged parking areas in many cities. To help to increase the sustainability of car transportation, a novel concept of a self-automated parking lot has been recently proposed, which leverages on a collaborative mobility of parked cars to achieve the goal of parking twice as many cars in the same area, as compared to a conventional parking lot. This concept, known as self-automated parking lots, can be improved if a reasonable prediction of the exit time of each car that enters the parking lot is used to try to optimize its initial placement, in order to reduce the mobility necessary to extract blocked cars. In this work we show that the exit time prediction can be done with a relatively small error, and that this prediction can be used to reduce the collaborative mobility in a self-automated parking lot.

********************************************

4th Seminar, 26th November, 2014.

********************************************

Presenter: Luboš Popelínský

Title: Graph Mining and Outlier Detection Meet Logic Proof Tutoring

Abstract: We introduce a new method for analysis and evaluation of logic proofs (resolution) constructed by undergraduate students. This method employs graph mining and outlier detection. The data contains a tree structure of the proof and also temporal information about all actions that a student performed. We introduce a novel method for finding generalized subgraph patterns as a new features. We show how class-based outlier detection on logic proofs represented by these new features helps to find unusual students’ solutions. We also introduce a method for explanation of the outliers.

***********************************************************************************

Presenter: Pedro Pereira Rodrigues

Title: Knowledge Discovery from Clinical and Administrative Data

Abstract: This talk will deal with knowledge discovery and decision support in the hospital. It starts by presenting Electronic Health Records (EHR) and Admission-Discharge-Transfer (ADT) systems, listing the most prominent dangers faced by a mining scholar who wants to analyze them. We will see the processes in which EHR/ADT are used, filled or modified, the knowledge discovery tasks in which these records must be analyzed, and the challenges of such an analysis. Data mining in the hospital must ideally flow into clinical decision support (CDS). We will also highlight the importance of adhering to the hospital protocols for data processing and model evaluation, and the importance of integrating CDS into the hospital processes.

**********************************************
3rd Seminar, 28th October, 2014.

**********************************************

Presenter: Maria do Carmo da Rocha Sousa

Title: Towards a Dynamic Model for Credit Risk Assessment

Abstract: We are developing a new dynamic modeling framework for credit risk assessment that extends the prevailing credit scoring models built upon historical data static settings. The driving idea mimics the principle of films, by composing the model with a sequence of snapshots, rather than a single photograph. In doing so, the dynamic modeling consists of sequential learning from the new incoming data. A key contribution is provided by the insight that different amounts of memory can

be explored concurrently. Memory refers to the amount of historic data being used for estimation. This is important in the credit risk area, which often seems to undergo shocks. During a shock, limited memory is important. Other times, a larger memory has merit.

Afiliation: Maria Rocha Sousa was born in Penafiel, Portugal, in 1980. She received her MSc degree in Mathematical Engineering, in 2007, and her B.Sc in Mathematics, in 2003, from the University of Porto, Portugal. She is currently a PhD student in Finance, at the University of Porto, Portugal. She is Coordinator of an Analytic's team in the bank Millennium BCP, being responsible for monitoring, developing, and implementing models in the areas of credit risk and profit-driven decision-making. Her main interests are Credit Risk Dynamic Modeling, Stress-testing, Optimization and Pricing Modeling, in the context of International Financial Regulation.

*************************************************************************************************

Presenter: Cristiano Inácio Lemes

Title: Adding Diversity to Rank Examples in Anytime Nearest Neighbor Classification

Abstract: In the last decade we have witnessed a huge increase of interest in data stream learning algorithms. A stream is an ordered sequence of data records. It is characterized by properties such as the potentially infinite and rapid flow of instances. However, a property that is common to various application domains and is frequently disregarded is the very high fluctuating data rates. In domains with fluctuating data rates, the events do not occur with a fixed frequency. This imposes an additional challenge for the classifiers since the next event can occur at any time after the previous one. Anytime classification provides a very convenient approach for fluctuating data rates. In summary, an anytime classifier can be interrupted at any time before its completion and still be able to provide an intermediate solution. The popular k-nearest neighbor (k-NN) classifier can be easily made anytime by introducing a ranking of the training examples. A classification is achieved by scanning the training examples according to this ranking.

In this presentation, we show how the current state-of-the-art k-NN anytime classifier can be made more accurate by introducing diversity in the training set ranking. Our results show that, with this simple modification, the performance of the anytime version of the k-NN algorithm is consistently improved for a large number of datasets.

Afiliation: Computer Science and Computational Mathematics Program of University of São Paulo, Brasil

*****************************************

2nd Seminar, 2nd October, 2014.

*****************************************

Presenter: Abril Arcia

(Abril is a PhD Student from the Neural Networks and Unconventional Computing Lab/Alpha-Beta Group, Centro de Investigación en Computación, Instituto Politécnico Nacional, Ciudad de México, México. This student is supervised by João Gama, a researcher from LIAAD - INESC TEC).

Title: Data stream classification based on the Gamma classifier

Abstract: In this presentation some of the most relevant features of the Gamma classifier and its application for data stream classification will be introduce. Some preliminaries results using real and synthetic datasets are also presented.

********************************************
1st Seminar, 24th September, 2014.
******************************************

Presenter: Alexandre Carvalho (PhD Researcher at the Laboratory for Artificial Intelligence and Decision Support - INESC Technology and Science (LIAAD - INESC TEC)

Title: First-Principle Simulation for Data-Driven Prediction

Abstract : Data simulation with benchmark problems for machine learning and data mining purposes is an important task. Generating a realist controlled scenario allows to design and test specific quests, such as, drift detection or fault detection. Using common benchmark datasets enables simple comparisons among algorithms and their performance. With first-principle simulated data the behavior of real scenarios are approximated to best of our scientific models. In this work we adapt and update a well-known benchmark system control problem, the Tennessee Eastman plant-wide industrial problem. The Tennesse Eastman (TE) plant-wide industrial process control problem was proposed as a test of alternative control and optimization strategies for continuous chemical processes. With a slow drift, and several step process disturbances combining with random variation. The TE problem is suitable for a wide range of controlled scenario tests. We present the results of the multi-target prediction problem and compared the results of PLS, M5 and MTSMOTI.

********************************************************************************

Presenter: Fábio Pinto (a PhD student working with Carlos Soares (CESE - INESC TEC) and João Mendes-Moreira (LIAAD - NESC TEC).

Title: Pruning Bagging Ensembles with Metalearning

Abstract: Ensemble learning algorithms often benefit from pruning strategies that allow to reduce the number of individuals models and improve performance. In this work, we propose a Metalearning method for pruning bagging ensembles. Our proposal differs from other pruning strategies in the sense that allows to prune the ensemble before actually generating the individual models. The method consists in generating a set characteristics from the bootstrap samples and relate them with the impact of the predictive models in multiple tested combinations. We executed experiments with bagged ensembles of 20 and 100 decision trees for 53 UCI classification datasets. Results show that our method is competitive with a state-of-the-art pruning technique and bagging, while using only 25% of the models.

Past Seminars( 2013-2014)

*************************************
18th Seminar, 23rd July 2014.
*************************************
Presenter: Jose C. Principe
Title: A Cognitive Architecture for Object Recognition in Video
Abstract: This talk describes our efforts to abstract from the animal visual system the computational principles to explain images in video. We develop a hierarchical, distributed architecture of dynamical systems that self-organizes to explain the input imagery using an empirical Bayes criterion with sparseness constraints and dual state estimation. The interpretation of the images is mediated through causes that flow top down and change the priors for the bottom up processing. We will present preliminary results in several data sets.

Short Bio:
Jose C. Principe (M’83-SM’90-F’00) is a Distinguished Professor of Electrical and Computer Engineering and Biomedical Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs) modeling. He is BellSouth Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.ufl.edu . His primary area of interest is processing of time varying signals with adaptive neural models. The CNEL Lab has been studying signal and pattern recognition principles based on information theoretic criteria (entropy and mutual information). Dr. Principe is an IEEE Fellow. He was the past Chair of the Technical Committee on Neural Networks of the IEEE Signal Processing Society, Past-President of the International Neural Network Society, and Past-Editor in Chief of the IEEE Transactions on Biomedical Engineering. He is a member of the Advisory Board of the University of Florida Brain Institute. Dr. Principe has more than 600 publications. He directed 81 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled “Neural and Adaptive Systems” published by John Wiley and Sons and more recently co-authored several books on “Brain Machine Interface Engineering” Morgan and Claypool, “Information Theoretic Learning”, Springer, and “Kernel Adaptive Filtering”, Wiley.

*************************************
17th seminar, 1st July 2014
*************************************
Presenter: Peter Clark
Senior Research Manager at the Allen Institute for Artificial Intelligence
Title: From Information Retrieval towards Knowledgeable Machines

Abstract:
At some point in the future, we will have knowledgeable machines - machines that contain internal models of the world and can answer questions, explain those answers, and dialog about them. A substantial amount of that knowledge will likely come from machine reading, whereby internal representations are synthesized from textual information. We are exploring this at the new Allen Institute for Artificial Intelligence (AI2) with a medium-term focus on having the computer pass fourth-grade science tests, with much of that knowledge acquired semi-automatically from text. In this presentation I will outline our picture of such a system and summarize some of our early research, in particular explorations in direct "reading" of rules from texts (e.g., study guides), and how we are seeking to go beyond the limits of information retrieval for question-answering.

BIO: Peter Clark is the Senior Research Manager for AI2. His work focuses upon natural language processing, machine reasoning, and large knowledge bases, and the interplay between these three areas. He has received several awards including a AAAI Best Paper (1997), Boeing Associate Technical Fellowship (2004), and AAAI Senior Member (2014). He received his Ph.D. in Computer Science in 1991, and has researched these topics for 30 years with more than 80 refereed publications and over 5000 citations.

*********************************************************************
Presenter: Cesar Guevara
Title: Development of Efficient Algorithms for Detection of Intruders and Detection Data Leak In Computer Systems

Abstract:
Detection and control of intruders, data leakage or unauthorized access to computer systems has always been important when dealing with information systems where security, integrity and privacy are key issues. Although computer devices are more sophisticated and efficient, there is still the necessity of establishing safety procedures to avoid illegitimate accesses. The purpose of this work is to show how different intelligent techniques can be used to create new algorithms and identify users accessing critical information and to check whether or not access is allowed. Advanced and intelligent analysis and data mining techniques such as decision trees and artificial neural networks have been applied to obtain patterns of users’ behavior. Dynamic user profiles are obtained. The main
contribution of this work is to show effective solutions for the detection of intruders and data leakage in computer information systems.

*************************************
16th Seminar, 25th June 2014.
*************************************
Presenter: Conceição Rocha
Title: Data Assimilation: contributions for modeling, prevision and control in anesthesia

Abstract:
During surgical interventions a muscle relaxant drug is frequently administrated with the objective of inducing muscle paralysis. This work aims at contributing to personalize anesthetic drug administration during surgery. In fact, personalization is one of the aims of P4, Predictive, Preventive, Personalized and Participatory, medicine which is the modern trend in health care. Furthermore, clinical environment and patient safety issues lead to a huge variety of situations that must be taken into account requiring intensive simulation studies. Hence, population models are crucial for research and development in this field. In this work, we develop two models - a stochastic population models for the muscle paralysis) level induced by atracurium, and an online robust model to predict the maintenance dose of atracurium necessary for the resulting effect. We also address the problem of joint estimation of the state and parameters for a deterministic continuous time system, with discrete time observations, in which the parameter vector is constant but its value is not known, being a random variable with a known distribution.

*********************************************************************
Presenter: Sónia Dias
Title: Linear regression with empirical distributions
Professor at the Polytechnic Institute of Viana do Castelo

Abstract:
In the classical data framework one numerical value or one category is associated with each individual (microdata). However, the interest of many studies lays in groups of records gathered according to characteristics of the individuals or classes of individuals, leading to macrodata. The classical solution for these situations is to associate with each individual or class of individuals a central measure, e.g., the mean or the mode of the corresponding records; however with this option the variability across the records is lost. For such situations, Symbolic Data Analysis proposes that a distribution or an interval of the individual records' values is associated with each unit, thereby considering new variable types, named symbolic variables. One such type of symbolic variable is the histogram-valued variable, where to each entity under analysis corresponds an empirical distribution that can be represented by a histogram or a quantile function. If for all observations each unit takes values on only one interval with weight equal to one, the histogram-valued variable is then reduced to the particular case of an interval-valued variable. In either case, an Uniform distribution is assumed within the considered intervals. Accordingly, it is necessary to adapt concepts and methods of classical statistics to new kinds of variables. The functional linear relations between histogram or between interval-valued variables cannot be a simple adaptation of the classical regression model. In this presentation new linear regression models for histogram data and interval data are presented. These new Distribution and Symmetric Distributions Regression Models allow predicting distributions/intervals, represented by their quantile functions, from distributions/intervals of the explicative variables. To determine the parameters of the models it is necessary to solve quadratic optimization problems subject to non-negativity constraints on the unknowns. To define the minimization problems and to compute the error measure between the predicted and observed distributions, the Mallows distance is used. As in classical analysis, it is possible to deduce a goodness-of-fit measure from the models whose values range between 0 and 1. Examples on real data as well as simulated experiments illustrate the behavior of the proposed models and the goodness-of-fit measure. These studies indicate a good performance of the proposed methods and of the respective coefficients of determination.
Homepage: http://www.estg.ipvc.pt/~sdias

*************************************
15th Seminar, 16th June 2014.
*************************************
Presenter: Pascal Poncelet
Title: Towards an Unifying Approach for Extracting Trajectories

Abstract:
Recent improvements in positioning technology has led to a much wider availability of massive moving object data. A crucial task is to find the moving objects that travel together. Usually, they are called spatio-temporal patterns. Due to the emergence of many different kinds of spatio-temporal patterns in recent years, different approaches have been proposed to extract them. However, each approach only focuses on mining a specific kind of pattern. In addition to the fact that it is a painstaking task due to the large number of algorithms used to mine and manage patterns, it is also time consuming. Additionally, we have to execute these algorithms again whenever new data are added to the existing database. In this talk I will present a unifying approach, named GeT Move, using a frequent closed itemset-based spatio-temporal pattern-mining algorithm to mine and manage different spatio-temporal patterns. GeT Move is implemented in two versions which are GeT Move and Incremental GeT Move. Furthermore I will address how trajectories can be used in other kinds of domains.
Homepage: http://www.lirmm.fr/~poncelet/indexEN.html

*************************************
The 14th seminar, 11th June 2014.
*************************************
Apresentador: Leandro Nunes de Castro
Título: Computação Natural: Conceitos e Aplicações

Resumo: A computação pode ser vista em três diferentes contextos dentro da Computação Natural: para a resolução de problemas complexos; para a síntese de fenômenos naturais; e para a busca de novas matérias primas com as quais computar. Em todos os casos, uma compreensão adequada do fenômeno natural é a base para novas ideias e o entendimento de como a computação é realizada. Este entendimento é geralmente obtido através de modelos, por exemplo, dinâmica dos planetas, imunologia, redes de reações químicas, bactérias, diversidade de espécies, dentre muitos outros. Estes modelos tornaram-se tão importantes para o entendimento da natureza que um novo ramo da computação natural foi proposto para incorporar a modelagem computacional de fenômenos naturais. Essa apresentação faz uma introdução geral sobre a área, destacando os principais focos de pesquisa do Laboratório de Computação Natural (LCoN) da Universidade Mackenzie, SP, Brasil. Serão apresentados estudos de caso em análise de dados de mídias sociais, logística, dentre outras.

CV: http://buscatextual.cnpq.br/buscatextual/visualizacv.do?metodo=apresentar&id=K4769993T4

**************************************
Apresentador: Vinícius M. A. de Souza
Título: Como a Inteligência Artificial pode contribuir no combate de insetos vetores de doenças e pragas agrícolas?

Resumo: Ao longo de toda a história da humanidade, insetos tem um forte relacionamento com o bem estar das pessoas, seja de maneira positiva ou negativa. Insetos são vetores de doenças que matam milhões de pessoas todos os anos e, ao mesmo tempo, são responsáveis pela polinização de boa parte da produção alimentícia mundial. Por estas razões, muitos pesquisadores tem desenvolvido um arsenal de métodos de controle de insetos com o objetivo de reduzir a presença de espécies maléficas com o mínimo de impacto para espécies benéficas. Neste seminário será discutido como a área de Inteligência Artificial pode contribuir para combater insetos vetores de doenças e pragas agrícolas. Mais especificamente, será apresentado os objetivos e desafios do projeto de um sensor laser de baixo custo capaz de realizar contagens e classificações de espécies de insetos utilizando algoritmos de Aprendizado de Máquina e métodos de Processamento Digital de Sinais.

CV: http://lattes.cnpq.br/6394929576717854

*************************************
The 13th seminar , 3rd June, 2014
*************************************
Presenter: Aljaž Osojnik
PhD student working with João Gama
Title: Learning models for structured output prediction from data streams

Abstract: Nowadays, data is generated at ever increasing rates and it utilizes more and more complex data structures. We present the problem of online structured output prediction, namely, we describe the online data stream mining approach and the structured output prediction problem. We describe several issues that arise in online structured output prediction, i.e., evaluation, change detection and resource complexity. We focus on the structured output prediction tasks of multi-label classification, multi-target regression and hierarchical multi-label classification. We provide an overview of the current research in the areas of batch and online methods for multi-label classification, multi-target regression and hierarchical multi-label regression, as well as some of the evaluation metrics used in these cases. We conclude with a discussion of directions of further work in the area of improving existing multi-target regression methods and how those can be applied to the tasks of multi-target classification and multi-label classification, as well as adapting current batch hierarchical multi-label classification for the online setting.

**************************************
Presenter: Carlos Ferreira
PhD student working with João Gama
Title: Exploring Temporal Patterns from Multi-relational Databases

Abstract: Multi-relational databases are widely used to represent and store data. Often, a multi-relational database is composed by tables recording static data, that do not change over time, and by tables recording dynamic data, that is being accumulated over time. Finding temporal patterns in such temporal databases is an important challenge in domains as diverse as video processing, computer biology and elderly monitoring. The main goal of this work is to study methods and techniques to explore the temporal information available in such multi-relational databases. Mainly, to find rich patterns and learn highly expressive classification theories. In particular, we explore temporal information using either propositional or first-order logic sequence miners. Moreover, we employ propositionalization and predicate invention techniques to learn either propositional or FOL theories.

*************************************

The 12th seminar , 16th May, 2014
*************************************
Presenter: Hadi Fanaee Tork
PhD student working with João Gama
Title: Event labeling combining ensemble detectors and background knowledge

Abstract: Event labeling is the process of marking events in unlabeled data. Traditionally, this is done by involving one or more human experts through an expensive and time-consuming task. In this presentation we propose a new event labeling model relying on an ensemble of detectors and background knowledge. The target data are the usage log of a real bike sharing system. We first label events in the data and then evaluate the performance of the ensemble and individual detectors on the labeled data set using ROC analysis and static evaluation metrics in the absence and presence of background knowledge. The results show that when there is no access to human experts, the proposed approach can be an effective alternative for labeling events.
Paper, Data set

**************************************
Presenter: Mohammadreza Valizadeh
PhD student working with Pavel Brazdil
Title: Improving the Performance of Text Information Retrieval (IR) Systems

Abstract: This thesis focus on two major issues. One is re-ranking and the other is summarization. We have proposed a new method for re-ranking based on Query Sensitive similarity measure. After that, the retrieved documents can be summarized. we have proposed several methods for summarizing multiple documents. 1 unsupervised (unsupervised Graph-based summarization) and 2 supervised methods have been proposed (user-based method and Ensemble Method Combined with Actor-Object Relationship).

.****************************************
11th Seminar, April 30, 2014
*****************************************
Presenter: Vânia Almeida
Senior researcher at LIAAD - INESC TEC;
Title: Collaborative Wind Power Forecast

Abstract: Wind power is considered one of the most rapidly growing sources of electricity generation all over the world. This talk presents a new collaborative forecasting framework for wind power that uses information from distributed neighbor wind farms, being the wind power prediction at each wind farm based on data from different locations. The experiments are based on real wind power measurements available from 16 wind farms. The scope is the short term wind power forecast (six-hours-ahead) using Auto-Regressive Integrated Moving Average (ARIMA) models. The problem was defined in two main issues: 1) search for motifs using Aggregate Approximation (SAX) representation and 2) construction of the correlation network, being the desired output the decrease of the root mean square error (RMSE), taking as the reference the models using only data from each farm.

**************************************
Presenter: João Cordeiro
Researcher at LIAAD - INESC TEC and a Professor at the Department of Informatics of the University of Beira Interior.
Title: Learning Sentence Reduction Rules from the Wilderness

Abstract: Sentence Compression has recently received a great attention from the research community of Automatic Text Summarization (ATS). Sentence Reduction consists in the elimination of sentence components such as words, part-of-speech tags sequences or chunks without highly deteriorating the information contained in the sentence and its grammatical correctness. In this presentation I will start by making a quick and broad overview of the field of ATS, followed by a more detailed explanation of our work in the subfield of Sentence Compression. In particular, I will present an unsupervised scalable methodology for learning sentence reduction rules. First, Paraphrases are discovered within a collection of automatically crawled Web News Stories and then textually aligned in order to extract interchangeable text fragment candidates, in particular reduction cases. As only positive examples exist, Inductive Logic Programming (ILP) provides an interesting learning paradigm for the extraction of sentence reduction rules. As a consequence, reduction cases are transformed into first order logic clauses to supply a massive set of suitable learning instances and an ILP learning environment is defined within the context of the Aleph framework. Experiments evidence good results in terms of irrelevancy elimination, syntactical correctness and reduction rate in a real-world environment as opposed to other methodologies proposed so far.

*************************************
10th Seminar ,16 of April 2014, LIAAD Main Auditorium
************************************
Presenter: Dalila B.M.M. Fontes
Title: Scheduling Projects with alternative tasks subject to technical failure

Abstract: Nowadays, organizations are often faced with the development of complex and innovative projects. This type of projects often involves performing tasks which are subject to failure. Thus, in many such projects several possible alternative actions are considered and performed simultaneously. Each alternative is characterized by cost, duration, and probability of technical success. The cost of each alternative is paid at the beginning of the alternative and the project payoff is obtained whenever an alternative has been completed successfully. For this problem one wishes to find the optimal schedule, i.e. the starting time of each alternative, such that the expected net present value is maximized.This problem has been recently proposed by Ranjbar and Davari (2013), where a branch-and-bound approach is reported. Here we propose to solve the problem using dynamic programming.

**************************************
Presenter: Alberto Adrego Pinto
Title: Price competition in the Hotelling model with uncertainty on costs

Abstract: This work develops a theoretical framework to study price competition in a Hotelling-type network game, extending the Hotelling model of price competition with linear transportation costs from a line to a network. Under explicit conditions on the production costs and road lengths we show the existence of a pure Nash price equilibrium. Furthermore, we introduce incomplete information in the production costs of the firms and we find the Bayesian-Nash price equilibrium.

*************************************
9th Seminar, 26 March 2014, LIAAD meeting room
*************************************
Presenter: José Fernando Gonçalves
Title: A biased random-key genetic algorithm for the Minimization of Open Stacks Problem

Abstract: This presentation describes a biased random-key genetic algorithm (BRKGA) for the Minimization of Open Stacks Problem (MOSP). The MOSP arises in a production system scenario, and consists of determining a sequence of cutting patterns that minimizes the maximum number of opened stacks during the cutting process. The approach proposed combines a BRKGA and a local search procedure for generating the sequence of cut patterns. A novel fitness function for evaluating the quality of the solutions is also developed. Computational tests are presented using available instances taken from the literature. The high-quality of the solutions obtained validate the proposed approach.

Supported by Fundação para a Ciência e Tecnologia (FCT) through projects PTDC/EGE-GES/117692/2010.Keywords: Minimization of Open Stacks Problem, Cutting Pattern, Biased Random-Key Genetic Algorithm, Random keys.

**************************************
Presenter: Paula Brito
Title: Multivariate Analysis of Distributional Data

Abstract: In statistics and multivariate data analysis, the units under analysis are usually single elements described by numerical and/or categorical variables, each element taking one value for each variable. However, the data under analysis may not be single observations, but groups of units gathered on the basis of common properties, or observed repeatedly over time, or concepts described as such – therefore the observed values present variability. In such situations, data is usually reduced to central statistics leading to loss of important information. Symbolic Data Analysis provides a framework where the observed variability are considered in the data representation. To describe groups of individuals or concepts, new methods are developed and new variable types are introduced, which may now assume other forms of realizations (e.g., sets, intervals, or distributions for each entity) that take into account the intrinsic data variability. In this talk, we consider the case where individual observations are summarized by distributions, and recall some methods that have been developed to analyse such data. In particular, we shall focus on clustering methodologies.

Keywords: complex data, distribution data, histogram-valued variables, symbolic data

**************************************
8th Seminar, February 26, 2014 INESC Porto Main Auditorium
*************************************
Presenter: Ricardo Bessa, USE
Senior researcher at INESC TEC in its Power Systems Unit
Title: Spatial-Temporal Solar Power Forecasting for Smart Grids

Abstract: The solar power penetration in distribution grids is growing fast during the last years, particularly at the low voltage (LV) level, which introduces new challenges when operating distribution grids. Across the world, Distribution System Operators (DSO) are developing the Smart Grid concept and one key tool, for this new paradigm, is the solar power forecasting. This talk presents a new spatial-temporal forecasting framework, based on the vector auto-regression framework, which combines observations of solar generation collected by smart meters and distribution transformer controllers. The scope is six-hour-ahead deterministic and probabilistic forecasts at the residential solar photovoltaic and MV/LV substation levels. This framework has been tested in the Smart Grid pilot of Évora, Portugal, and using data from 44 micro-generation units and 10 MV/LV substations. A benchmark comparison was made with the Autoregressive forecasting framework (AR - univariate model).

***********************************
Presenter: Fabien Gouyon
Senior researcher at INESC TEC, UTM, leading the Sound and Music Computing research group
Title: Evaluating the evaluation, the case of music classification

Abstract: In this talk, I take a critical viewpoint on the validity of current approaches to evaluation in Music Information Retrieval research, and in particular music classification. Experiments using three state-of-the-art approaches to building music classification systems crossed with three different datasets show that performance measured by the standard approach to evaluation is not valid for concluding whether a music classification system is objectively good, or better than another. I am particularly interested in opening discussion to evaluation issues in machine learning.

***********************************
7th Seminar, 14th February, 2014
***********************************
Presenter: Diego Marron,
A student working with Albert Bifet at Yahoo! Research
Title: GPU Random Forests and Decision Trees for Evolving Big Data Streams

Abstract: Web companies have an increasing need of more and more computation power, to effectively analyze big data streams in real-time in order to extract useful information. Most of this data are short lived and evolve with time. Big data stream analysis usually is done in clusters that, due to an increasing demand of more computation power, are going in size. This situation bring the opportunity of exploring new ways to achieve better performance with less resources. One option is to use GPU to process evolving big data streams. GPU are throughput oriented massive parallel architecture providing very attractive performance boosts.

In this thesis we present an implementation of Random Forest tree ensemble using random Very Fast Decision Tree on GPU. The results are compared to two well know machine learning frameworks such as VFML and MOA, achieving speedups on the GPU of at least 300x faster and with similar accuracy. In our tests we used only one GPU for evaluation, which is also cheaper to use and maintain than a cluster. Moreover, we minimized communication between CPU and GPU for each batch to be processed to only two: a first one from the CPU to the GPU to send data to process, and a second in the opposite direction to get the final result.

***********************************
6th Seminar, 29th January, 2014
***********************************
Presenter: Nuno Escudeiro
Title: Active learning: when to stop querying?

Abstract:
The main goal in Active Learning (AL) is to select an accurate hypothesis from the version space at low cost, i.e., while requiring as few queries as possible. Asking the oracle to label more instances than those that are necessary has a negative impact on the performance (and cost) of the learning process. From this point of view, knowing when to stop might be as relevant as having a good query selection strategy. The AL process should be stopped when the utility of new queries degrades below a given threshold and model quality stops improving. Specifying this utility, and the critical threshold, are task-dependent. Some simple stopping criteria, such as, exhausting the unlabeled set or predefining a desired size for the training set according to the available budget, are obvious but do not take into account efficiency concerns neither assure that the resulting learner is accurate enough for the task at hand. When the goal is to reduce the cost of the learning process, as in our case, it is important to analyze whether the most informative instance is still valuable enough; the utility of the queries should compensate their cost. Therefore, querying the oracle should stop once the cost of querying overcomes the utility of the unlabeled instances still remaining in the working set. In this talk we discuss three base stopping criteria -- classification gradient, steady entropy mean and steady entropy distribution – plus two hybrid criteria aggregating the former in a specific way as to improve over its foundational criteria.

***********************************
Presenter: Márcia Oliveira
Title: Visualizing Evolving Social Networks using Node-level and Community-level Trajectories

Abstract: Visualization of static social networks is a mature research field in information visualization. Conventional approaches rely on node-link diagrams which provide a representation of the network topology by representing nodes as points and links between them as lines. However, the increasing availability of longitudinal network data has spurred interest in visualization techniques that go beyond the static node-link representation of a network. In temporal settings, the focus is on the network dynamics at different levels of analysis (e.g. node, communities, whole network). Yet, the development of visualizations that are able to provide actionable insights into different types of changes occurring on the network and their impact on both the neighbourhood and the overall network structure is a challenging task. This work attempts to tackle this challenge by proposing a methodology for tracking the evolution of dynamic social networks, at both the node-level and the community-level, based on the concept of temporal trajectory. We resort to three-order tensors to represent evolving social networks and we further decompose them using a Tucker3 model. The two most representative components of this model define the 2D space where the trajectories of social entities are projected. To illustrate the proposed methodology we conduct a case study using a set of temporal self-reported friendship networks.
Paper

***********************************
5th Seminar, 10th January, 2014
***********************************
Presenter: Albert Bifet
Title: Mining Big Data in Real Time

Abstract:
Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques. We will present the MOA software framework with classification, regression, and frequent pattern methods, and the new SAMOA distributed streaming software.

************************************
Presenter: Brett Drury
Title: Creating Bayesian Networks from Text

Abstract: Bayesian networks can represent knowledge and make inferences in complex domains, but their construction is not easy. On the other hand, much of human knowledge is in texts (newspapers, articles, etc.) and with the invention of Internet access to these texts has become easy. Consequently, research strategies to automatically create Bayesian networks for complex domains from information in texts has become an area of current and relevant research. This presentation will discuss methods for constructing Bayesian networks from information in texts.

***********************************
4th Seminar, 17th of December, 2013.
***********************************
Presenter: Pavel Brazdil, Rui Leite and Carlos Soares
Title: Metalearning & Algorithm Selection

Abstract: First we present the motivation for this work. As the number of possible algorithms increases, the user is faced with a problem of algorithm selection. This problem arises in many different domains, starting with classification, regression and other subareas of machine learning and data mining to optimization and satisfiability. We describe how meta-learning can be used to aid the user in selecting the appropriate algorithm for a given problem. The seminar will cover both standard methods based on static meta-level characteristics and more recent approaches that exploit experimentation in order to present the user with a viable suggestion. In this context we present rather “mysterious” formula that permits to estimate the success of alternative solutions which has led to very good experimental results.

In the second part of this talk we will elucidate how this work can be can be generalized to help the user to conceive successful workflows of operations in data mining. Finally, we will explain how the techniques can be re-used in other domains, including e.g. optimization problems and satisfiability, and also, who is currently working on which problem.

************************************
Presenter: Raquel Sebastião,
PhD student working with João Gama
Title: Learning from Data Streams: Synopsis and Change Detection

Abstract: The emergence of real temporal applications under non-stationary scenarios has drastically altered the ability to generate and gather information. Nowadays, potentially unbounded and massive amounts of information are generated at high-speed rate, known as data streams. Therefore, it is unreasonable to assume that processing algorithms have sufficient memory capacity to store the complete history of the stream. Indeed, stream learning algorithms must process data promptly, discarding it immediately. Along with this, as data flows continuously for large periods of time, the process generating data is not strictly stationary and evolves over time.

This presentation embraces concerns raised when learning from data streams. Namely, concerns raised by the intrinsic characteristics of data streams and by the learning process itself. The former is addressed through the construction of synopses structures of data and change detection methods. The latter is related to the appropriate evaluation of stream learning algorithms.

***********************************
3rd, Seminar, 27th November, 2013.
***********************************
Presenter: Luís Matias,
PhD student working with João Mendes Moreira and João Gama
Titulo: On Predicting the Taxi-Passenger Demand: A Real-Time Approach

Abstract: Informed driving is increasingly becoming a key feature to increase the sustainability of taxi companies. The sensors installed in each vehicle are providing new opportunities to automatically discover knowledge, which in return deliver information for real-time decision-making. Intelligent transportation systems for taxi dispatching and for finding time-saving route are already exploring this sensing data. This paper introduces a novel methodology to predict the spatial distribution of taxi-passenger for a short-term time horizon using streaming data. Firstly, the information was aggregated into a histogram time series. Then, three time series forecasting techniques were combined to originate a prediction. Such techniques are able to learn in real-time by their incremental characteristics. Consequently, they easily react to bursty or unexpected events. Experimental tests were conducted using the online data transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide an effective insight on the spatio-temporal distribution of taxi-passenger demand for a 30 minutes horizon.

************************************
Presenter: Pedro Abreu,
Colaborator of João Mendes Moreira
Title: A Recommender System Applied to a Soccer Environment

Abstract: Collaborative filtering techniques have been used almost exclusively throughout the years in Internet environments, helping users find items they are expected to like, something that is equivalent to finding the same kind of books in a bookstore. Normally, these techniques use the past purchases of the users in order to provide recommendations. With this concept in mind, this research used a collaborative technique to automatically improve the performance of a robotic soccer team. Many studies have attempted to address this problem over the last years. However, these studies have always presented drawbacks in terms of the improvement of the soccer team. Using a collaborative filtering technique based on nearest neighbors and the FC Portugal team as the test subject, simulations were performed for matches between three different teams (performing much better, better and worse from the perspective of the FCPortugal) were simulated. The strategy of FCPortugal was to combine set plays and team formation. The performance of the FC Portugal team improved between 32% to 377%, and these results are quite promising. In the future, this kind of approach will be expanded to other robotic soccer situations, such as the 3D simulation league.

***********************************
2nd Seminar, 30th October, 2013.
***********************************
Presenters: Andre Dias and Pedro Campos
Title: Agent-based modeling in Economics and Management with NetLogo
(joint work with Pavel Brazdil, André Dias and Pedro Amaro).

Abstract: Models in Economics usually assume market equilibrium and constant individual preferences. When these assumptions do not hold, the analysis of mixed levels (individual cognitive level and social level) may constitute the answer to model building. Agent-based modeling and simulation are definitely important techniques aimed at understanding social phenomena. In this talk we focus on two recent typical applications of Agent-based modeling in Economics and Management: (i) the process of new ventures’ creation; and (ii) systemic risk in banking networks. Models have been implemented using NetLogo, an agent-based simulation tool. Model (i) aims to approach the process of creating new ventures, which sets in two main phases: the business opportunity identification by the entrepreneur, considering the various factors that influence the entrepreneurial attitude and business opportunity development. In model (ii), a network of banking relationships in the inter-banking market is created. The goal consists in verifying the existence of tipping points in systemic risk in banking network, has we have observed in recent financial crisis.

************************************
Presenter: Carlos Sáez,
PhD student working with Pedro Rodrigues
Title: Metrics and methods for biomedical data quality assessment

Abstract: Biomedical data require a sufficient level of quality for its reuse. Optimally, researchers would expect stable, free of problems datasets. However, due to the original non-reuse purpose of the data, it does not generally meet these expectations. Additionally, biomedical data is generally multi-dimensional, contains multiple-types and multi-modal variables, and is generated through time and from multiple sources, characteristics which may complicate the process of assessing its quality.

In this presentation we will first introduce the under-development framework for biomedical data quality assessment considering the aforementioned characteristics. It is based on a set of metrics and methods based on data quality dimensions. Then, we will describe the development of a metric to measure the spatial stability among data sources based on a simplicial projection of probability distribution distances. Finally, we will show the current proposals for methods for temporal data quality assessment.

******************************************
1st Seminar, 16th October, 2013
******************************************
Presenter: Alípio Jorge
Title: Classifying Heart Sounds using Multiresolution Time Series Motifs

Abstract: The aim of this work is to describe an exploratory study on the use of a SAX-based Multiresolution Motif Discovery method for Heart Sound Classification. The idea of our work is to discover relevant frequent motifs in the audio signals and use the discovered motifs and their frequency as characterizing attributes. We also describe different configurations of motif discovery for defining attributes and compare the use of a decision tree based algorithm with random forests on this kind of data. Experiments were performed with a dataset obtained from a clinic trial in hospitals using the digital stethoscope DigiScope. This exploratory study suggests that motifs contain valuable information that can be further exploited for Heart Sound Classification.

************************************************
Presenter: Rui Camacho
Title: From Logic Programming to Inductive Logic Programming: a new approach to parallelise ILP systems

Abstract: Inductive Logic Programming (ILP) is a flavour of Multi-relational Data Mining. The use of a powerful representation formalism (First Order Logic) makes ILP suitable to handle data with structure and construct highly complex and comprehensible models. However, ILP systems, exhibit usually very long run times. In this talk we present some of the previous attempts to speed up ILP system and present a new approach, based on parallel execution. The approach is based on a long time and well know technique from AND-parallelism Logic Programming. Apart from the speed up achieved by the parallel execution, a new type of pruning was defined: coverage equivalence pruning. This new type of pruning avoids a substantial number of "useless" clauses to be constructed. An implementation was done by adapting the Aleph system. The new approach has been empirically evaluated and the results are promising.

Personal tools

Sections

January 26, 15h, Allan Tucker

Past Seminars( 2013-2014)