My Blog List

Saturday, November 19, 2016

Anomaly Detection in Large Social Graphs  


This talk is Neil Shah’s rehearsal for his thesis proposal a few days later. During his doctoral research, Shah is interested in problems in the field of anomaly detection, which is closely related to my current project. Anomaly detection refers to finding abnormal items, events and phenomena that do not confirm to an expected pattern. In particular, Shah’s work focuses on finding fraudsters and anomalies in online services.

As Shah points out, during the last years, online social services have become increasing popular and ubiquitous in our everyday lives. In e-commerce networks like Amazon, buyers buy and rate the products of businesses, in social networks like Facebook and Twitter, users follow each other, and like, response, comment to other people’s posts. Because data-driven algorithms are commonly used to recommend relevant contents to users, evaluate authenticity of users. Many anonymous fraudsters take advantages of online services to fake reviews or buy followers. Shah and his colleagues, by leveraging the massive datasets encoding users’ behaviors generated every day, discern suspicious users and anomalous behaviors in online social graphs. Generally speaking, his work can be divided into three categories – static graph, dynamic graphs and rich graphs.

Firstly, he starts with plain graphs which are usually be used to describe who-follow-whom or who-rate-which static networks. Shah talks about the weakness of several state-of-the-art spectral-based fraud detection methods – they can detect blatantly fraudulent users, while small-scale and stealthy attacks may be unnoticed. To resolve such problems, they develop a method called fBox in ICDM 2014.

Secondly, they broaden the scope to dynamic graphs in which case structure information of graphs spanning a period of time can be known. As a matter of fact, real-world graphs exhibit constant-existing as well as temporal evolving patterns. Three examples are given in their paper “TimeCrunch: Interpretable Dynamic Graph Summarization”: (1) botnet attackers forming a bipartite core with their victims over the duration of an attack, (2) family members bonding in a clique-like fashion over a difficult period of time, or (3) research collaborations forming and fading away over the years. In this work, they propose an effective and parameter-free and efficient method called TimeCrunch to find coherent, temporal patterns in dynamic graphs.

Recently, Shah’s work turns into rich graphs. Rich graphs are those that, in addition to merely structures, also consist of other informative attributes such as time, rating, or review texts. In one of their work, they detect anomalous users by studying the statistical patterns in edge attributes. Due to the richness of various attributes, Shah’s future work will be conducted in the third aspect, to tackles more challenges and propose novel methods in rich graphs. To summarize, Shah’s thesis span across many disciplines – machine learning, graph theory and social science, and maintain valuable applications in practice. For more information, please refer to his publications as follows (more in his homepage).

Static graphs:
Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective
Neil Shah, Alex Beutel, Brian Gallagher, Christos Faloutsos
IEEE International Conference on Data Mining (ICDM) 2014.

Dynamic graphs:
TimeCrunch: Interpretable Dynamic Graph Summarization
Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, Christos Faloutsos
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2015.

Rich graphs:
EdgeCentric: Anomaly Detection in Edge-Attributed Networks
Neil Shah, Alex Beutel, Bryan Hooi, Leman Akoglu, Stephan Gunnemann, Disha Makhija, Mohit Kumar, Christos Faloutsos
IEEE International Conference on Data Mining (ICDM) Workshop on Data Mining for Cyber Security 2016.

Details about talk:
Speaker: Neil Shah
Date: Nov 18, 2016

No comments:

Post a Comment