In recent years, with the explosive growth of real networks and structured data sets, a new class of graphs came to light. This kind of graphs are huge and very sparse in general, with some prevailing characteristics. The structure of such networks is hard to describe in general and, moreover, the structure is only a starting point. When we think about complex networks, we should take into account connectedness both at the level of structure and of behavior. This means that, in addition to tools to analyze network structure, we also need a framework for reasoning about behavior and interaction in network contexts, where a single event may cause subtle cause-effect events. Although it is commonly accepted that structure has influence on behavior, to our knowledge little work has been done on how dynamics influence network structure. On the other hand, since complete observation may not be possible and tinkering with real systems may lead to unexpected disruptions, suitable simulation models and tools are a must.

Objectives. The NetDyn project (Understanding real large networks, from structure to dynamics) came in this line of research, with the aim of developing new models and tools for the study the relationship between large networks structure and processes dynamics, focusing applications on microbiology, epidemics and social networks.

Contributions. One of the main contributions was a new open source tool, PHYLOViZ (www.phyloviz.net), that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. This tool includes the implementation of scalable graph mining algorithms for studying possible evolutionary relationships between isolates, allowing ancillary data to be dynamically integrated. This tool has been used widely by both research and industry microbiology community.

This project contributed also with an initial study on how pathogens genetic diversity is affected by host contact networks, providing insights on the intricate relationships of network structure and population evolution. These first results paved the way for an ongoing PhD project on large scale stochastic simulations of evolving agents on top of complex networks, aiming to further understand the phenomena underlying such relationships.

Tools for collecting data from real social networks were also developed and several large datasets were analyzed and made available for other research projects ongoing at INESC-ID. The analysis of these datasets focused the dynamic nature of real networks and its influence on well known social network properties, such as the impact on small world phenomena.

Team. The research team reflected the highly multidisciplinary character of the project. The team had three participating institutions, two national research institutes, INESC-ID and IMM, and one foreign and well known research unit, Yahoo! Research in Barcelona. The team from INESC-ID gathered researchers with experience on the development of efficient algorithms for data mining and simulation, and on systems modeling. The team from IMM gathered researchers with vast knowledge on epidemiology data and models while the team from Yahoo! Research brought invaluable expertise on large data handling, graph mining and algorithms.