Network representation learning

Between 2016 and 2019, thanks to the efforts of two PhD students under my supervision, Zekarias Kefato and Nasrullah Sheikh, I began exploring problems completely outside the distributed systems field—namely, machine learning, with a particular focus on network representation learning. Perhaps it is my academic age, but when I say “I began exploring,” what I really mean is that Zekarias and Nasrullah did 95% of the work, while my role was mainly to ensure that the papers were easy to read.

Network Representation Learning (NRL) is a method for learning a low-dimensional embedding of a graph in such a way that its geometric properties are preserved. The learned embeddings can then be applied to various downstream machine learning tasks, such as classification and link prediction.

NRL can leverage different sources of information in a graph, such as network structure, attributes, and cascades. These sources may be used independently or in combination, depending on their availability. Early research relied primarily on structural information because it is always available by default. More recent work suggests that incorporating additional information can lead to better representations. The main challenge lies in how to effectively integrate different sources of information into the learning process.

Towards this end, we worked on two directions:

Network representation learning on attributed graphs and heterogeneous graphs. GAT2VEC [Computing19] learns a representation of nodes from structural context and attribute context obtained from structural and attribute information respectively. The HETNET2VEC [SNAMS18b] model is for heterogenous network representation learning. The model preserves the various semantic relationship among nodes to learn a representation.
Using cascade information for network representation learning [MLG17] [MLG17][LOD18] and virality prediction [SNAMS18a]. In the case of social networks, the underlying network information may not be available due to provider restrictions, but we can observe the diffusion events which are signals of the underlying networks. Using cascades we can use recover the underlying network through network representation learning.

Additional results have been obtained in the field of influencer detection [MAISON17], network inference/link prediction [MOD17]

[MAISON17] Zekarias T. Kefato and Alberto Montresor. Personalized influencer detection: Topic and exposure-conformity aware. In Proc. of the International Workshop on Mining Actionable Insights from Social Networks, MAISoN’17. ACM, February 2017. [PDF], [Bibtex].

[MLG17] Zekarias T. Kefato, Nasrullah Sheikh, and Alberto Montresor. Deepinfer: Diffusion network inference through representation learning. In Proc. of the 13th International Workshop on Mining and Learning With Graphs, MLG’17. ACM, August 2017. [PDF], [Bibtex].

[MOD17] Zekarias T. Kefato, Nasrullah Sheikh, and Alberto Montresor. Mineral: Multi-modal network representation learning. In Proc. of the 3rd International Conference on Machine Learning, Optimization and Big Data, MOD’17. ACM, September 2017. [PDF], [Bibtex].

[LOD18] Zekarias T. Kefato, Nasrullah Sheikh, and Alberto Montresor. REFINE: Representation learning from diffusion events. In Proc. of the 4th Conference on Machine Learning, Optimization and Data science, LOD’18. Springer, September 2018. [PDF], [Bibtex].

[SNAMS18a] Zekarias T. Kefato, Nasrullah Sheikh, Leila Bahri, Amira Soliman, Alberto Montresor, and Sarunas Girdzijauskas. CAS2VEC: network-agnostic cascade prediction in online social networks. In Proc. of the 5th International Conferenceon Social Networks Analysis, Management and Security (SNAMS 2018), pages 72–79. IEEE, October 2018. [PDF], [Bibtex].

[SNAMS18b] Nasrullah Sheikh, Zekarias T. Kefato, and Alberto Montresor. Semi-supervised heterogeneous information network embedding for node classification using 1D-CNN. In Proc. of the 5th International Conference on Social Networks Analysis, Management and Security (SNAMS 2018), pages 177–181. IEEE, October 2018. [PDF], [Bibtex].

[Computing19] Nasrullah Sheikh, Zekarias Kefato, and Alberto Montresor. GAT2VEC: Representation learning for attributed graphs. Computing, 101(3):187–209, 2019. [PDF], [Bibtex].

[CN19] Nasrullah Sheikh, Zekarias T. Kefato, and Alberto Montresor. A simple approach to attributed graph embedding via enhanced autoencoder. In Proceedings of the Eighth Int. Conference on Complex Networks and Their Applications (COMPLEX NETWORKS 2019), volume 881 of Studies in Computational Intelligence, pages 797–809. Springer, December 2019. [PDF], [Bibtex]

[WWW21] Zekarias T. Kefato, Sarunas Girdzijauskas, Nasrullah Sheikh, and Alberto Montresor. Dynamic embeddings for interaction prediction. In Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia, editors, The Web Conference 2021, WWW’21, pages 1609–1618. ACM / IW3C2, April 2021. [PDF], [Bibtex].