Motivation: In recommendation systems, conventional collaborative filtering approach usually has limited performance because of data sparsity of user-item matrix as well as cold-start problem. To address this issue, researchers are trying to methods that incorporate auxiliary heterogeneous datasets in order to boost recommendation performance. Those large-scale auxiliary datasets form a huge resource repository, which is referred as knowledge base. This paper leverages three types of knowledge – structural knowledge, textual knowledge and visual knowledge – to develop an integrated framework called “Collaborative Knowledge Base Embedding (CKE)” to achieve higher quality of recommendation.
|Figure 1. Framework Overview.|
Approaches: The framework of CKE can be divided into three components – data preparation, knowledge base embedding and collaborative joint learning. Figure 1 offers a clear and general overview. (I) In first stage, they gather a dataset of user-item interactions. Here they focus on the implicit feedback scenario, where user-item interactions have ambiguous meanings. For example, no rating does not necessarily indicate no interests, it can also mean no awareness. Besides, they also collect a rich heterogeneous knowledge base containing three types of information – structural, textual and visual. Structural knowledge can be regarded as a heterogeneous network consisting of various entities (e.g. genre, actor, director) and various relationships (e.g. acting, directing). Textual data always contains topic relevant information, e.g. a storyline for a movie or judgements for a book. Visual knowledge can be referred to a book’s front page or a movie’s poster image. (2) In second stage, they use embedding approaches to learn item’s latent features from the above mentioned knowledge base. To be specific, for network structure, they adopt TransR , a heterogeneous network embedding method, to extract item’s structural representations. For textual information, they apply stacked denoising auto-encoders (SDAE) , a deep learning based embedding technique, to obtain item’s textual representations. For visual knowledge such as movie posters, they use stacked convolutional auto-encoders (SCAE) , another state-of-the-art deep learning based embedding approach, to learn visual representations for items. Note that deep learning based embedding techniques do not require any feature engineering, and the representations are directly extracted from raw data. (3) In final stage, to integrate collaborative filtering with item’s representations learned in second stage, they propose CKE approach. CKE involves the calculation of pairwise preference probability between any two items j and j' as p(j>j';i|theta) where theta represents model parameters. The final recommendation result is a ranked list of items.
|Figure 2. Recall@K and MAP@K results for CKE and baselines.|
Experiments: They conduct experiments on two datasets – MovieLens-1M and IntentBooks. MovieLens-1M is a dataset consisting of 1M users’ ratings for movies, and IntentBooks is gathered by Microsoft’s Bing search engine and Microsoft’s Satori knowledge base. As shown in Figure 2, the complete comparisons with a series of baselines validate the effectiveness of the proposed CKE framework, and also sheds lights on future’s usage of heterogeneous data sources in recommendation systems.
 Lin,Y., Liu,Z., Sun,M., Liu,Y., and Zhu,X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of AAAI (2015).
 Vincent,P., Larochelle,H.,Lajoie,I.,Bengio,Y.,and Manzagol, P.-A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research 11 (2010), 3371–3408.
 Masci,J.,Meier,U.,Cires ̧an,D.,and Schmidhuber,J.Stacked convolutional auto-encoders for hierarchical feature extraction. In Artificial Neural Networks and Machine Learning–ICANN 2011. Springer, 2011, 52–59.