Motivation: Link prediction or recommendation is very popular in today's online services, such as recommending potential friends in Facebook, suggesting qualified employers in LinkedIn and recommending research colleagues in ResearchGate. Link recommendation is actually a personalized ranking problem - producing a list of ranked people to target user. Signed networks, e.g. Epinions or Slashdot, allows people to indicate whether they trust (positive) or distrust (negative) each other, so that the plausible recommendation result would be a list with positive relationships on the top and negative links at the bottom. Traditional evaluation measures - AUC - is no longer suitable for the case where there exist three types of relationships: positive, negative and unknown. A recently proposed generalized AUC (GAUC), although cares about both the head and tail of rank list, does not require that the head links are all positive or the tail links are all negative. Therefore, this paper starts from GAUC, deriving two lower bounds for GAUC which can be computed in linear time, and further develop two probabilistic models to infer personalized list of latent links.
Approach: In AUC, both negative and latent links are considered as negative. AUC compares each pair of positive and negative links, and calculates the fraction of pairs within which the positive one has higher rank than the negative one. Therefore, the perfect rank list (positive on top, negative at bottom) has AUC = 1, where the worst rank list (negative on top, positive at bottom) has AUC = 0. There are two major disadvantages in AUC: (i) it is designed for binary case, while not suitable for cases of three classes; (ii) the computation is slow as it requires to compare each pair of links. This paper presents two lower bounds - bound I and bound II. Bound I compares each positive link with that link from latent and negative classes holding the maximum ranking score, and meantime compare each negative link with the link from positive and latent classes holding the minimum ranking score. Bound II is a stricter bound. It compares the minimum ranking score in positive class with the maximum score from negative and latent classes, meanwhile compares the maximum ranking score in negative class with the minimum one from positive and latent classes. Figure 1 shows the values of AUC, GAUC, GAUC-bound-I and GAUC-bound-II for 4 ranking list.
With the goal of optimizing the above two bounds, this paper further proposes two probabilistic models for link recommendation - efficient latent link recommendation-I (ELLR-I) and efficient latent link recommendation-II (ELLR-II). Given a partially observed matrix X, they want to learn two low-rank matrices U and V such that the ranking lists can be optimized in the sense of lower bounds for GAUC. They employ Bayesian model aiming to maximize the following posterior distribution:
where > f denotes the orderings obtained by X.
Experiments: They compare their algorithms ELLR-I and ELLR-II with the state-of-the-art baselines, including network topology-based method common neighbor (CN), point-wise approach matrix factorization (MF), pairwise methods maximum margin matrix factorization (MMMF), Bayesian personalized ranking based matrix factorization (BPR+MF), list-wise algorithms List-MF, as well as GAUC-optimization-based method OPT-GAUC. Four signed graphs are used, including Wikipedia, Slashdot, Epinions and MovieLens10M. Results reveal that ELLR-I and ELLR-II outperforms those top-performed competing methods in signed networks.
My Blog List
Tuesday, February 28, 2017
Sunday, February 26, 2017
Talk summary 1: Anomaly detection in large graphs
Summary: Prof. Faloutsos gave an very interesting big data talk "anomaly detection in large graphs" at iSchool of UPitt. In this talk, Prof. Faloutsos firstly introduced the significance of graph mining and anomaly detection, and then discussed some interesting researches he and his colleagues have conducted, mainly in two major directions: (1) patterns mining and anomaly detection in graphs, as well as (2) time-evolving graphs and tensors.
Motivation: Graph mining is a subtopic of data mining, except that it particularly focuses on graph data structure. Similar to data mining, graph mining is to analyze graph structures and extract meaningful patterns from those big graphs. Graph mining has attracted much attention of researchers because of the existence of massive graphs in real world (social networks, food web and protein-protein interaction networks etc.), as well as valuable applications in broad applications (e.g. fraud detection, recommendation systems and news propagation etc.). Anomaly detection in large graphs is a problem arising in this area: once normal patterns have been obtained from graphs, can we detect anomalies easily? The answer is yes!
Patterns & Fraud detection: Graphs are not randomly generated without any principals. Actually many interesting patterns have already been discovered by scientists, such as the famous power-low degree distribution, distribution of connected component size, triangle laws, k-core patterns and so on. Then how can we detect suspicious groups? Prof. Faloutsos gave an example of fraud detection in bipartite who-follows-whom graph. In social media (such as Twitter), users want to promote their own reputations, so that some fraudsters (dishonest followees) would buy followers (dishonest followers) to follow them. Besides, to camouflage such fraud behaviors, suspicious users would also deliberately create connections with a small fraction of honest users to make themselves look normal. For this case, they proposed a SVD-based approach [1]. By projecting who-follow-whom matrix into latent spectral subspace, they discovered different patterns in this space. (1) Randomly generated following behaviors is corresponding to the origin of subspace (normal situation), (2) suspicious dense "blocks" in adjacency matrix is transformed as "rays" in spectral subspaces, (3) "blocks" with camouflage is associated with tilting "rays" in spectral subspaces, and (4) overlapping "staircase" in matrix corresponds to "pearl" in latent subspace. Figure 1 illustrate the above mentioned patterns. By studying the projected patterns in latent subspace, the suspicious behaviors are easily detected.
Figure 1. Suspicious behavior in adjacency matrix and spectral latent subspace. |
Time-evolving graphs & tensors: The first section of works only concern with static graphs. However, in reality, many graphs are evolving over time. Therefore, n-way tensor is introduced to describe temporal changes of graphs. Similar to SVD, tensors can also be decompose and projected into spectral latent subspaces, where normal patterns and strange patterns can be clearly distinguished. More relevant work can be seen in [5-6].
References:
[1] Jiang, M., Cui, P., Beutel, A., Faloutsos, C., & Yang, S. (2014, May). Inferring strange behavior from connectivity pattern in social networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 126-138). Springer International Publishing.
[2] Jiang, M., Beutel, A., Cui, P., Hooi, B., Yang, S., & Faloutsos, C. (2015, November). A general suspiciousness metric for dense blocks in multimodal data. In Data Mining (ICDM), 2015 IEEE International Conference on (pp. 781-786). IEEE.
[3] Koutra, D., Ke, T. Y., Kang, U., Chau, D. H. P., Pao, H. K. K., & Faloutsos, C. (2011, September). Unifying guilt-by-association approaches: Theorems and fast algorithms. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 245-260). Springer Berlin Heidelberg.
[4] Eswaran, D., Günnemann, S., Faloutsos, C., Makhija, D., & Kumar, M. (2017). ZooBP: Belief Propagation for Heterogeneous Networks. Proceedings of the VLDB Endowment, 10(5).
[5] Araujo, M., Papadimitriou, S., Günnemann, S., Faloutsos, C., Basu, P., Swami, A., ... & Koutra, D. (2014, May). Com2: fast automatic discovery of temporal (‘comet’) communities. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 271-283). Springer International Publishing.
[6] Ferraz Costa, A., Yamaguchi, Y., Juci Machado Traina, A., Traina Jr, C., & Faloutsos, C. (2015, August). Rsc: Mining and modeling temporal activity in social media. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 269-278). ACM.
More information:
Prof. Faloutsos website: http://www.cs.cmu.edu/~christos/
Talk slides: http://www.cs.cmu.edu/~christos/TALKS/17-02-UPitt/
Wednesday, February 22, 2017
Week 8: Collaborative knowledge base embedding for recommendation systems
Reading Summary
Motivation: In recommendation systems, conventional collaborative filtering
approach usually has limited performance because of data sparsity of user-item
matrix as well as cold-start problem. To address this issue, researchers are
trying to methods that incorporate auxiliary heterogeneous datasets in order to
boost recommendation performance. Those large-scale auxiliary datasets form a
huge resource repository, which is referred as knowledge base. This paper
leverages three types of knowledge – structural knowledge, textual knowledge
and visual knowledge – to develop an integrated framework called “Collaborative
Knowledge Base Embedding (CKE)” to achieve higher quality of recommendation.
Figure 1. Framework Overview. |
Approaches: The framework of CKE can be divided into three components – data preparation, knowledge base embedding and collaborative joint learning. Figure 1 offers a clear and general overview. (I) In first stage, they gather a dataset of user-item interactions. Here they focus on the implicit feedback scenario, where user-item interactions have ambiguous meanings. For example, no rating does not necessarily indicate no interests, it can also mean no awareness. Besides, they also collect a rich heterogeneous knowledge base containing three types of information – structural, textual and visual. Structural knowledge can be regarded as a heterogeneous network consisting of various entities (e.g. genre, actor, director) and various relationships (e.g. acting, directing). Textual data always contains topic relevant information, e.g. a storyline for a movie or judgements for a book. Visual knowledge can be referred to a book’s front page or a movie’s poster image. (2) In second stage, they use embedding approaches to learn item’s latent features from the above mentioned knowledge base. To be specific, for network structure, they adopt TransR [1], a heterogeneous network embedding method, to extract item’s structural representations. For textual information, they apply stacked denoising auto-encoders (SDAE) [2], a deep learning based embedding technique, to obtain item’s textual representations. For visual knowledge such as movie posters, they use stacked convolutional auto-encoders (SCAE) [3], another state-of-the-art deep learning based embedding approach, to learn visual representations for items. Note that deep learning based embedding techniques do not require any feature engineering, and the representations are directly extracted from raw data. (3) In final stage, to integrate collaborative filtering with item’s representations learned in second stage, they propose CKE approach. CKE involves the calculation of pairwise preference probability between any two items j and j' as p(j>j';i|theta) where theta represents model parameters. The final recommendation result is a ranked list of items.
Figure 2. Recall@K and MAP@K results for CKE and baselines. |
Experiments: They conduct experiments on two datasets – MovieLens-1M and IntentBooks. MovieLens-1M is a dataset consisting of 1M users’ ratings for movies, and IntentBooks is gathered by Microsoft’s Bing search engine and Microsoft’s Satori knowledge base. As shown in Figure 2, the complete comparisons with a series of baselines validate the effectiveness of the proposed CKE framework, and also sheds lights on future’s usage of heterogeneous data sources in recommendation systems.
References
[1] Lin,Y., Liu,Z., Sun,M.,
Liu,Y., and Zhu,X. Learning entity and relation embeddings for knowledge graph
completion. In Proceedings of AAAI (2015).
[2] Vincent,P., Larochelle,H.,Lajoie,I.,Bengio,Y.,and
Manzagol, P.-A. Stacked denoising autoencoders: Learning useful representations
in a deep network with a local denoising criterion. The Journal of Machine
Learning Research 11 (2010), 3371–3408.
[3] Masci,J.,Meier,U.,Cires
̧an,D.,and Schmidhuber,J.Stacked convolutional
auto-encoders for hierarchical feature extraction. In Artificial Neural
Networks and Machine Learning–ICANN 2011. Springer, 2011, 52–59.
Subscribe to:
Posts (Atom)