Thesis

Prerequisites

My current research is about large-scale distributed systems, cloud computing, large-scale graph analysis, data mining and P2P systems. Students of the Laurea Triennale interested in doing their thesis with me should have already completed the courses on Operating Systems, Computer Networks, Algorithms and Data Structures, Programming 1. Students of Laurea Magistrale should have completed courses on Distributed Systems and/or Big Data, depending on the content of thesis.

How to ask for a thesis

The process of pairing a student to her/his supervisor is really a random one, in Trento as well as in Italian universities. Students ask for a thesis, professors propose some ideas or refuse claiming that they are overcommitted / they have too many students / they have no projects at the moment / etc.

In my case, sometimes I am really obliged to say no; there are periods in which I receive four-five requests per week, and clearly I cannot be a good supervisor for all of them. In order to understand if a student is the right person for a thesis, I ask you to send me a mail specifying the following information:

  • When you want to start
  • When you want to finish (ideally)
  • How many exams you need to pass in order to complete your degree
  • The list of exams as output by Esse3, with the marks that you have obtained
  • The grade point average (voto medio pesato)
  • If you have additional experiences beside the courses at the university, add a CV
  • Your personal interests in the field of computer science

I prefer to supervise theses that are either completely external (stage + thesis completed in a company) or completely internal (UniTN internship + thesis completed at DISI). This corresponds to 15 ECTS credits at the Bachelor level (approximately 2.5-3 months) and to 30 ECTS credits at the master level (approximately 5-6 months).

Current ideas (January 2016)

  • Title: Large Scale Community Detection on Graph Parallel Frameworks.
    Description: Community Detection is the task of finding a set of nodes in a given network that are strongly connected with each other and are loosely connected to the rest of the network. As a result of the growing number of users of many real world networks, such as Social Networks, their size has been growing in an exponential rate. The sheer size of these networks has pose a major algorithmic and computational challenges in the area of social network analysis. The most widely used approach to overcome such challenges is to capitalize on the power of distributed algorithms and computations. Recently, we have witnessed the emergence of distributed data processing frameworks, most notably Hadoop and Spark, and distributed graph processing frameworks, such as Pregel and Giraph. The goal of this thesis is to map existing state of the art community detection algorithms to the programming model of a couple of chosen graph parallel frameworks and then to implement the algorithms on top of the frameworks and do a comparative study on the performances.
  • Title: Personalized PageRank (tirocinio interno  in cooperazione con SpazioDati o stage presso SpazioDati)
    Description:  La tecnica Personalized PageRank permette di calcolare, dato un nodo di un grafo, un punteggio per tutti gli altri nodi in base alla loro relazione col nodo sorgente. Questa tecnica viene spesso usata per sistemi di raccomandazione che suggeriscano nuovi nodi (che potrebbero rappresentare tweets, utenti, prodotti) a partire da un nodo. Purtroppo calcolare esattamente il Personalized PageRank a partire da ogni nodo è troppo costoso su grafi di grandi dimensioni, per cui bisogna accontentarsi di approssimazioni. Il tesista dovrà investigare, sviluppare e confrontare algoritmi per individuare i nodi con maggiore Personalized PageRank a partire da ogni nodo sorgente.
  • Title: Twitter Exploration (stage presso SpazioDati)
    Description: SpazioDati raccoglie da quasi due anni tutti i tweet geolocalizzati in Italia e vorrebbe effettuare un analisi esplorativa di questi dati per aumentare il numero di account twitter associati a persone e aziende all'interno del suo grafo della rete aziendale italiana. Il tesista dovrà interfacciarsi con grandi quantità di dati utilizzando sistemi per l'analisi distribuita (Spark) e sfruttare gli strumenti di analisi semantica dei testi per estrarre informazioni dai testi dei tweet e dalla descrizione degli utenti.
  • Title: Influenza Prediction
    Description: Riprodurre, aggiornare e localizzare in Italia lavori quali i seguenti, il cui scopo è quello di predirre l'andamento dell0influenza utilizzando i contatori di accessi a particolari pagine di Wikipedia.

    • McIver, David J., and John S. Brownstein. 2014. “Wikipedia Usage
      Estimates Prevalence of Influenza-Like Illness in the United States in
      Near Real-Time.” Edited by Marcel Salathé. PLoS Computational Biology 10
      (4): e1003581. doi:10.1371/journal.pcbi.1003581
    •  Hickmann, Kyle S., Geoffrey Fairchild, Reid Priedhorsky, Nicholas
      Generous, James M. Hyman, Alina Deshpande, and Sara Y. Del Valle. 2014.
      “Forecasting the 2013–2014 Influenza Season Using Wikipedia.” arXiv
      Preprint arXiv:1410.7716.
  • Title: From MOOC to MAIT
    Description: Un MOOC è un Massive Online Open Course; si vedano come esempi i corsi di Coursera. In questo articolo,  gli autori suggeriscono un'ulteriore evoluzione, ovvero lo sviluppo di MAIT: massive adaptive interactive text(book). L'idea è quella di realizzare testi didattici interattivi (sullo stile di questo testo, per intenderci) in cui sia possibile studiare in maniera interattiva, integrando il testo e l'esecuzione del codice (e non solo). Lo scopo di questa tesi è esplorare tecniche interattive nel campo degli algoritmi, allo scopo di formare la base per un futuro testo su algoritmi e strutture dati.  Fra le tecnologie che potranno essere studiate, si considerino anche le seguenti:

  • Title: Cooperation with the StrepHit project (FBK)
    Description: StrepHit is an Artificial Intelligence that reads free text from Web sources, understands it and feeds Wikidata, Wikimedia's knowledge base. Its development is currently funded by the Wikimedia Foundation, U.S.A. Your assignment is to take the tool that displays its data to the next level, making it really usable for the Wiki user community.
    Further information here

Some of my past students here in Trento...

  • Roberto Zandonati worked on his thesis about the slicing problem in peer-to-peer systems. We later cooperated in writing a paper based on his work. The paper has been accepted here:
    Alberto Montresor and Roberto Zandonati. Absolute slicing in peer-to-peer
    systems
    . In Proc. of the 5th International Workshop on Hot Topics in Peer-to-Peer Systems (HotP2P'08), Miami, FL, USA, April 2008.
  • Alessio Guerrieri worked on his thesis on DTNs in cooperation with the Create-Net research center (here in Povo). A paper based on his work has been accepted here:
    Alessio Guerrieri, Alberto Montresor, Iacopo Carreras, Francesco De Pellegrini, and Daniele Miorandi. Distributed estimation of global parameters in delay-tolerant networks. In Proceedings of the 3rd IEEE WoWMoM Workshop on Autonomic and Opportunistic Communications (AOC'09), Kos, Greece, June 2009.
    Later, an extended version of this paper was published in a journal: Alessio Guerrieri, Iacopo Carreras, Francesco De Pellegrini, Daniele Miorandi, and Alberto Montresor. Distributed estimation of global parameters in delay-tolerant networks. Computer Communications, 2010.
    BTW, Alessio also secured a scholarship of 12.000 euros to participate in a double degree with GeorgiaTech. He spent the academic year 2009/2010 in Atlanta, Georgia, USA. He later published other papers during his Ph.D. studies under my supervision.
  • Andrea Dalla Valle worked on a thesis on partition detection in peer-to-peer systems. We have not worked on a paper yet (my fault!); but again, in the mean time Andrea was the second student to get the Georgiatech scholarship for 2009/2010.
  • Vinay Sachidananda, one of our students of the "Invest your talent in Italy" program, worked on an external thesis with ArsLogica; I served as internal tutor. Later, part of his work was published here: Andrey Somov, Vinay Sachidananda, and Roberto Passerone. A Self-Powered Module with Localization and Tracking System for Paintball. In Proceedings of IWSOS 2008, Vienna, Austria, December 12th 2008. Springer Verlang: LNCS 5343, 182 - 193. As you can guess from the author list, my rile was marginal
  • Gabriele Seppi worked on a thesis about "Popularity-based Caching in Underlying Networks With Client Mobility" working together with DoCoMo (Germany). Gabriele took part in Double Degree with Georgiatech. The work has been done completely by Gabriele and the DoCoMo guys.
  • Stella Margonar developed the Java software that is available on my Algoritmi e Strutture Dati course web page, for the visualization of algorithms and exercises. Her work was later bought by the publisher that printed my book on the topic.
  • Simone Miorelli developed a flock simulator. The idea is that flocks are examples of self-organized distributed systems; each flock member follows very simple rules, while a complex, global behavior emerges. His work has been sponsored by MUSE - the (then) upcoming museum of natural science and has inspired some of the exhibits.
  • Paolo Pandini is an example that everybody should consider eventually: he is an high-school professor who decided to enroll in our computer science degree after his retirement. He is becoming younger every year he spend with us! He helped us in designing teaching modules in computer science for some elementary schools in Valsugana, based on the "Computer Science Unplugged" book.
  • Federico Scrinzi secured a Google Summer of Code scholarship, working on Euscan (Ebuild Upstream Scanner), a powerful application for detecting outdated ebuilds in the Gentoo package manager by looking for new upstream versions of the packages. My role was really minimal - he completed all the work by himself. He is now a googler in Dublin.