My Blog List

Sunday, February 26, 2017

Talk summary 1: Anomaly detection in large graphs

Summary: Prof. Faloutsos gave an very interesting big data talk "anomaly detection in large graphs" at iSchool of UPitt. In this talk, Prof. Faloutsos firstly introduced the significance of graph mining and anomaly detection, and then discussed some interesting researches he and his colleagues have conducted, mainly in two major directions: (1) patterns mining and anomaly detection in graphs, as well as (2) time-evolving graphs and tensors.

Motivation: Graph mining is a subtopic of data mining, except that it particularly focuses on graph data structure. Similar to data mining, graph mining is to analyze graph structures and extract meaningful patterns from those big graphs. Graph mining has attracted much attention of researchers because of the existence of massive graphs in real world (social networks, food web and protein-protein interaction networks etc.), as well as valuable applications in broad applications (e.g. fraud detection, recommendation systems and news propagation etc.). Anomaly detection in large graphs is a problem arising in this area: once normal patterns have been obtained from graphs, can we detect anomalies easily? The answer is yes!

Patterns & Fraud detection: Graphs are not randomly generated without any principals. Actually many interesting patterns have already been discovered by scientists, such as the famous power-low degree distribution, distribution of connected component size, triangle laws, k-core patterns and so on. Then how can we detect suspicious groups? Prof. Faloutsos gave an example of fraud detection in bipartite who-follows-whom graph. In social media (such as Twitter), users want to promote their own reputations, so that some fraudsters (dishonest followees) would buy followers (dishonest followers) to follow them. Besides, to camouflage such fraud behaviors, suspicious users would also deliberately create connections with a small fraction of honest users to make themselves look normal. For this case, they proposed a SVD-based approach [1]. By projecting who-follow-whom matrix into latent spectral subspace, they discovered different patterns in this space. (1) Randomly generated following behaviors is corresponding to the origin of subspace (normal situation), (2) suspicious dense "blocks" in adjacency matrix is transformed as "rays" in spectral subspaces, (3) "blocks" with camouflage is associated with tilting "rays" in spectral subspaces, and (4) overlapping "staircase" in matrix corresponds to "pearl" in latent subspace. Figure 1 illustrate the above mentioned patterns. By studying the projected patterns in latent subspace, the suspicious behaviors are easily detected.

Figure 1. Suspicious behavior in adjacency matrix and spectral latent subspace.
Several relevant works are also introduced [2-4], more details can be seen in References.

Time-evolving graphs & tensors: The first section of works only concern with static graphs. However, in reality, many graphs are evolving over time. Therefore, n-way tensor is introduced to describe temporal changes of graphs. Similar to SVD, tensors can also be decompose and projected into spectral latent subspaces, where normal patterns and strange patterns can be clearly distinguished. More relevant work can be seen in [5-6].

[1] Jiang, M., Cui, P., Beutel, A., Faloutsos, C., & Yang, S. (2014, May). Inferring strange behavior from connectivity pattern in social networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 126-138). Springer International Publishing.
[2] Jiang, M., Beutel, A., Cui, P., Hooi, B., Yang, S., & Faloutsos, C. (2015, November). A general suspiciousness metric for dense blocks in multimodal data. In Data Mining (ICDM), 2015 IEEE International Conference on (pp. 781-786). IEEE.
[3] Koutra, D., Ke, T. Y., Kang, U., Chau, D. H. P., Pao, H. K. K., & Faloutsos, C. (2011, September). Unifying guilt-by-association approaches: Theorems and fast algorithms. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 245-260). Springer Berlin Heidelberg.
[4] Eswaran, D., G√ľnnemann, S., Faloutsos, C., Makhija, D., & Kumar, M. (2017). ZooBP: Belief Propagation for Heterogeneous Networks. Proceedings of the VLDB Endowment, 10(5).
[5] Araujo, M., Papadimitriou, S., G√ľnnemann, S., Faloutsos, C., Basu, P., Swami, A., ... & Koutra, D. (2014, May). Com2: fast automatic discovery of temporal (‘comet’) communities. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 271-283). Springer International Publishing.
[6] Ferraz Costa, A., Yamaguchi, Y., Juci Machado Traina, A., Traina Jr, C., & Faloutsos, C. (2015, August). Rsc: Mining and modeling temporal activity in social media. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 269-278). ACM.

More information:
Prof. Faloutsos website:
Talk slides:

No comments:

Post a Comment