My Blog List

Tuesday, April 25, 2017

Reading 15: User-Driven System-Mediated Collaborative Information Retrieval

Motivation: Collaborative information retrieval (CIR) involves more than one information seekers in a searching process. Different from individual information retrieval (IIR), CIR needs to take into account users' skills and preferences, also to support their communication and cooperations. Existing approaches can be generally divided into two categories. One is user-based CIR system which allows users to decide role assignments and supports their communication in searching process; the other one is system-based CIR which system would impose roles to users and optimizes information retrieval. Both of them have limitations and merits. This paper tries to combine them together, and proposes a user-driven system-mediated CIR system.

Approach: In the user-driven system-mediated CIR, users' searching behaviors would be monitored and analyzed; it would discover significant behavioral differences between each pair of users; finally it suggests roles to users in order to leverage the best of users' skills and preferences. Take the pair of roles "Gatherer versus Surveyor" as an example. In a collaborative searching process, gatherer would be more likely to seek highly relevant documents, so that he or she would perform queries with much overlap, and spend more time reading webpage contents; whereas, surveyor would tend to explore new things on websites, he or she would try different queries, spend less them on each webpage and accordingly, the query success is low. By analyzing such features, i.e. query overlap, dwell time and query success, we can obtain insights into user's role difference.

Experiments: They ask participants (students in Rutgers University) to collaboratively write a report on an exploratory topic. User study #1 has the topic of "Gulf oil spill" and user study #2 focuses on the topic of "Global warming". In the searching session, supportive chat system and search tools enable bookmarking webpages and saving snippets. Specifically, three types of features are considered (shown in Table 1).
Table 1. Features used to describe a searching session.
By analyzing users' searching behaviors, they found that significant behavioral difference became obvious after a short period of time as a session started. Figure 1 reveals this observation as p values are very significant. Besides, during the same session, users do not change their roles.
Figure 1. Significant difference in users' search behaviors.
Finally, they compare the effect of role mining on information retrieval in a CIR task. Table 2 shows the comparison with four baselines and report the average increase that RB-CIR has obtained. From it we observe that, RB-CIR outperforms all other methods except for PM-CIR. They argued that since the difference between RB-CIR and PM-CIR is not significant, it's still reasonable to say that RB-CIR has better performance.
Table 2. Comparison of RB-CIR with four baselines.
One limitation in their proposed method is the lacking of prior knowledge of users' skills as roles are mined from current searching behaviors. Therefore, in the further they would consider users' prior searching behaviors and preferences to complement current actions.



Thursday, April 20, 2017

Reading 14: Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS)

Motivation: In recommendation and review systems, users might not only provide overall ratings but also write more informative reviews. In reviews, users usually express their sentiments, discuss the aspects they like or do not like, and provide interpretations about their ratings. It is necessary to exploit such implicit comments to boost recommendation performance. For such purpose, this paper determines to model ratings and sentiments in comments in a per-aspect way, and finally they propose a probabilistic model based on topic modeling and collaborative filtering that holds superior performance.
Fig. 1 Rating and review model in a per-aspect way. It consists of two parts: modeling ratings and modeling reviews.
Approaches: Given a user and a movie, their task is to predict (i) the observed rating as well as (ii) the review. (i) In terms of ratings, they assume that any observed overall rating is generated from individual aspect-specific ratings, including rating for actors, rating for plots and rating for visual effect etc. Different aspects (actors, plot, visual effect) may hold different importance weights. Larger value implies that user has an interest in such aspect and the movie also highlights such aspect. Simply speaking, high overall rating is generated through a good matching between user's expectations and movie's quality in those aspects that user cares most. (ii) When writing a review, the user might talk about movie specific contents, and also express aspect specific judgements (sentiments). To describe such variety, they assume that the review language model contains five components:
1. A background language component;
2. A background sentiment language component;
3. A movie-specific language component;
4. An aspect-specific language component;
5. An aspect-specific sentiment language component.

Fig.1 illustrate the entire framework of their probabilistic model. User's interest and movie's relevance are collectively used to model the final rating and review contents. Their model is called "JMARS".

Experiments: The experiment is conducted on a dataset collected from IMDb - a famous movie review website. In total, 50k movies along with their reviews are crawled. They use 80% of data as training data, 10% as validation and 10% for testing. Fig. 2 shows comparison of JMARS in terms of perplexity. When factor size is set as 5 or 10 (5 or 10 aspects are considered in movie), JMARS could always outperforms HFT approach. Besides, Fig. 3 reveals MSE comparison which also proves JMARS good performance.

Fig. 2 Comparison of models using perplexity.
Fig. 3 Comparison of models in terms of MSE.

Tuesday, April 11, 2017

Week 13: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Title: Visual-Textual Joint Relevance Learning for Tag-based Social Image Search, IEEE Transactions on Image Processing, 2013.

Motivation: With the development of social media, a new type of search has become increasingly popular - social search, and this paper particularly focuses on image search in social media. Given a tag query, for example "apple", a good search engine is able to output a set of highly relevant but also diverse images showing fruit apples, cellphone and MacBook. Tag-based social image search always leverages user-generated tags to calculate image's relevance score, however, such tags contain too much noises and it's difficult to form an optimal ranking strategy. Therefore, this paper seeks to simultaneously utilizes tags as well as visual information for image relevance learning.
Fig 1. Framework of the proposed visual-textual joint relevance learning method.
Method: The basic framework can be seen in Fig 1. (I) Given a set of images, each of them can be represented by two kinds of features - visual as well as textual features. (II) Based on such two types of representations, a hypergraph can be constructed. Here it should be highlighted that hypergraph is different from the general graph. In a hypergraph, edge is called "hyperedge", which does not represent pairwise interactions, while it is a relationship consisting of a set of images. To be specific, the nodes sharing the same tags can be "linked" by a textual-based hyperedge, and the images sharing the same visual "words" can be connected by a visual-based hyperedge. Fig 2. reveals textual-based hyperedges (left) and visual-based hyperedges (right). (III) A joint image relevance learning process is performed on a set of pseudo-relevant samples. Pseudo-relevant samples are actually labeled images collected based on tags. Then they propose an objective function aiming to learn a relevance vector f with each element indicating an image's relevance score. (IV) Based on the learned relevance score, the algorithm will return top-K images to users.

Fig 2. Examples of hyperedge construction. The left figure shows textual-based hyperedges and the right one shows visual-based hyperedges.
Experiment: They perform experiments on Flickr Image Dataset (104,000 images, 83,999 tags), and compare the proposed textual-visual joint hypergraph learning approach (HG-WE-joint) with five stat-of-the-art baselines, including graph-based semi-supervised learning, sequential social image relevance learning, tag ranking, tag relevance combination, hypergraph-based relevance learning with equal weight (HG). It also examines the performance of the proposed approach with only single information, HG-WE-visual or HG-WE-textual. Results show that HG-WE-joint outperforms all baselines and maintains good robustness to parameters. However, HG-WE-joint requires the highest computational cost to achieve the best retrieval performance.

Wednesday, April 5, 2017

Talk summary 2: Modeling Sequential Decision Making in Team Sports using Imitation Learning

Dr. Peter Carr is working as a research scientist at Disney Research, Pittsburgh. In this talk, he introduced their recent work about modeling sequential decision making in sports using imitation learning techniques. I would like to summarize his talk from the following three critical points.

What is the objective of their work?

In terms of team sports analysis, researchers try to compare the performance of a specific teams or players with that of a typical team in a professional league, i.e. average league performance. To be able to quantitatively study players' movement patterns, it requires existence of player tracing dataset. Fortunately, with the advance of data collection techniques in recent years, it becomes possible that people gather spatiotemporal sport data by tracing players' movements in a large number of games. In the talk, Dr. Peter Carr mentioned that they have used approximately 100 games of player tracking data from a professional soccer league for modeling sequential decision making.

With such dataset, they are interested in modeling defensive situations - what players would do within under the situation where the opposition had control of the ball. They explore what a defensive player should have done, based on the average league performance revealed by data, in comparison with what they actually did. This kind of work help us better understand the overall defensive strategies of a league as well as how a certain team would play differently. In this framework, the "should-have-done" motions are learned from player tracking data through "data-driven ghosting" method. In next section, we will summarize the high-level intuition of the method.

What is the method?

Data-driven ghosting is implemented based on imitation learning. Imitation learning, also called "learning from demonstration", is a process that computer automatically learns strategies by observing expert behaviors. It is similar to what human would do in learning process - a person who has no knowledge of sports, can understand what to do if he/she has observed a sufficient number of games.

Figure 1. Deep multi-agent imitation learning framework. Single player learning (Upper) and multi-agent learning (Bottom).
Their task is to predict the action of a player at each time step given the state feature, which actually is a online sequence decision making problem. Besides, they need to predict actions for multiple players at the same time. Therefore, they proposed a deep multi-agent imitation learning framework (figure 1). Two major components are presented - single player modeling and joint training of multiple players. In stage 1, the algorithm learns a model for each player to predict average league action, and in stage 2, these pre-trained models learned in stage 1 would be used in stage 2 for joint training of multiple agents. In both stages, training and prediction are combined together, so that a model can learn from their prediction mistakes to go back to "right" track (see in figure 1).

How about the results?

An example is shown in figure 2. The data-driven ghosting players (white) and their trajectories (white) represent average league movements; colored dots and trajectories represent actual movements in games. Results have revealed that the proposed model can generate a sequence of behaviors showing spatial and formational awareness. More information can be viewed in the video: http://www.disneyresearch.com/publication/data-driven-ghosting.
Figure 2. Ghosting behaviors (white) in comparison of actual movements.

[1] Le, Hoang M., et al. "Data-Driven Ghosting using Deep Imitation Learning." (2017).

Week 12: FolkTrails: Interpreting Navigation Behavior in a Social Tagging System

Motivation: Social tagging systems have been widely used to organize and store online information, such as webpages and publications (Delicious and Bibsonomy). In such systems, users can freely assign keywords to specific resources for retrieval and organization in the future. By tracing users' behaviors in those tagging systems, researchers are able to understand how they assign tags, navigate among resources and consume information. This paper focuses on the specific problem of interpreting human navigation behaviors within Bibsonomy, and exploring the behavioral differences between different user subgroups.

Methods: It formulates a set of hypotheses which can be encoded as transition probability, and then apply HypTrails [1], a framework for comparing hypotheses based on empirical observations, to figure out which hypotheses better capture the intrinsic mechanism of human navigation trails. Figure 1 shows an example of user reviews for restaurants in Italy on Yelp. To be specific, HypTrails defines a navigation trail as a first-order Markov chain over a sequence of states, so that each hypothesis can be formulated as transition probabilities jumping from one state to another. To conclude which hypothesis better reflect empirical observations, it leverages Bayes factor, which compares the marginal likelihood P(D|H) where D represents real observations and H indicates hypothesis.

Figure 1. (a-c) show three hypotheses about human trails - uniform hypothesis, geo hypothesis and self-loop hypothesis, (d) reveals the empirical observation.
In this paper, six basic hypotheses are formulated:

  1. Uniform Hypothesis - user randomly select a page for next visit
  2. Page Consistent Hypothesis - user visits the same page in next step (due to pagination)
  3. Category Consistent Hypothesis - user visits page under the same category as current page
  4. User Consistent Hypothesis - a transition's target and source belong to the same user
  5. Folksonomy Consistent Hypothesis - user goes to a page by following folksonomy structure
  6. Semantic Navigation Hypothesis - user goes to a page which maintains semantic relations with current one


Three combined hypotheses are introduced:

  1. Folksonomy Consistent & Semantic Navigation Hypothesis
  2. User Consistent & Semantic Navigation Hypothesis
  3. User Consistent & Folksonomy Navigation Hypothesis


Experiment results: The above mentioned hypotheses are tested based on empirical dataset collected from the social tagging system Bibsonomy. Results show that (i) in overall, the combination of user consistent & semantic navigation hypothesis works best, and the second one is user consistent hypothesis; (ii) users show different browsing behaviors between within his/her resources and outside his/her resources - within navigations tend to be explained by semantic navigation hypothesis, while outside behaviors are likely to be following folksonomy structure; (iii) short-term and long-term users show different behaviors - short-term users are likely to follow semantic navigation while long-term users are prone to follow folksonomy structure.


References:
[1]  Singer, P., Helic, D., Hotho, A., Strohmaier, M.: HypTrails: A bayesian approach for comparing hypotheses about human trails on the Web. In: Proc. of the 24th WWW Conf. (2015) 

Wednesday, March 22, 2017

Week 10: Exploiting User Feedback to Learn to Rank Answers in Q&A Forums: a Case Study with Stack Overflow

Title: Exploiting User Feedback to Learn to Rank Answers in Q&A Forums: a Case Study with Stack Overflow, SIGIR 2013


Motivation: Collaborative websites, such as Q&A platforms, are characterized by a loose edit control, allowing users to freely edit their questions and answers. To help distinguish or rank answers, many websites incorporate functions like asker selecting the best answers, and users commenting or vote for qualified answers etc. However, such manual assessment might not scale up to the increasing volume of data, and tends to be subject to personal biases. Therefore, this paper proposes to create an automated (or semi-automated) assessment mechanism to rank answers based on their quality features.

Methods: They apply point-wise learning to rank (L2R) approach using random forests for ranking answers in Stack Overflow. In this model, they construct a large set of features from eight groups. The eight groups are user features, user graph features, review features, structure features, length features, style features, readability features and relevance features. User and user graph features describe user's activities, reputations and influence in asking-answering graphs. They introduce review features into this model because of the intuition that a content received many edits tends to be improved towards high quality. The structure, length, style and readability features capture answers' properties from different perspectives. The relevance features describe how relevant the answer is to a specific question.

Figure 1. NDCG@k for RF, RF-BaseFeatures and other methods. The RF method is proposed by this paper, and RF-BaseFeatures only use features that have already been proposed in prior works.

Experiments: They apply the L2R model using Random Forest in Stack Overflow dataset, and find that (1) this proposed method outperforms all competing baselines regarding NDCG@k evaluation; Besides, (2) user and review features are the most significant groups of features than others.

Tuesday, February 28, 2017

Week 9: Efficient Latent Link Recommendation in Signed Networks

Motivation: Link prediction or recommendation is very popular in today's online services, such as recommending potential friends in Facebook, suggesting qualified employers in LinkedIn and recommending research colleagues in ResearchGate. Link recommendation is actually a personalized ranking problem - producing a list of ranked people to target user. Signed networks, e.g. Epinions or Slashdot, allows people to indicate whether they trust (positive) or distrust (negative) each other, so that the plausible recommendation result would be a list with positive relationships on the top and negative links at the bottom. Traditional evaluation measures - AUC - is no longer suitable for the case where there exist three types of relationships: positive, negative and unknown. A recently proposed generalized AUC (GAUC), although cares about both the head and tail of rank list, does not require that the head links are all positive or the tail links are all negative. Therefore, this paper starts from GAUC, deriving two lower bounds for GAUC which can be computed in linear time, and further develop two probabilistic models to infer personalized list of latent links.


Approach: In AUC, both negative and latent links are considered as negative. AUC compares each pair of positive and negative links, and calculates the fraction of pairs within which the positive one has higher rank than the negative one. Therefore, the perfect rank list (positive on top, negative at bottom) has AUC = 1, where the worst rank list (negative on top, positive at bottom) has AUC = 0. There are two major disadvantages in AUC: (i) it is designed for binary case, while not suitable for cases of three classes; (ii) the computation is slow as it requires to compare each pair of links. This paper presents two lower bounds - bound I and bound II. Bound I compares each positive link with that link from latent and negative classes holding the maximum ranking score, and meantime compare each negative link with the link from positive and latent classes holding the minimum ranking score. Bound II is a stricter bound. It compares the minimum ranking score in positive class with the maximum score from negative and latent classes, meanwhile compares the maximum ranking score in negative class with the minimum one from positive and latent classes. Figure 1 shows the values of AUC, GAUC, GAUC-bound-I and GAUC-bound-II for 4 ranking list.

With the goal of optimizing the above two bounds, this paper further proposes two probabilistic models for link recommendation - efficient latent link recommendation-I (ELLR-I) and efficient latent link recommendation-II (ELLR-II). Given a partially observed matrix X, they want to learn two low-rank matrices U and V such that the ranking lists can be optimized in the sense of lower bounds for GAUC. They employ Bayesian model aiming to maximize the following posterior distribution:
where > f denotes the orderings obtained by X.

Experiments: They compare their algorithms ELLR-I and ELLR-II with the state-of-the-art baselines, including network topology-based method common neighbor (CN), point-wise approach matrix factorization (MF), pairwise methods maximum margin matrix factorization (MMMF), Bayesian personalized ranking based matrix factorization (BPR+MF),  list-wise algorithms List-MF, as well as GAUC-optimization-based method OPT-GAUC. Four signed graphs are used, including Wikipedia, Slashdot, Epinions and MovieLens10M. Results reveal that ELLR-I and ELLR-II outperforms those top-performed competing methods in signed networks.


Sunday, February 26, 2017

Talk summary 1: Anomaly detection in large graphs


Summary: Prof. Faloutsos gave an very interesting big data talk "anomaly detection in large graphs" at iSchool of UPitt. In this talk, Prof. Faloutsos firstly introduced the significance of graph mining and anomaly detection, and then discussed some interesting researches he and his colleagues have conducted, mainly in two major directions: (1) patterns mining and anomaly detection in graphs, as well as (2) time-evolving graphs and tensors.

Motivation: Graph mining is a subtopic of data mining, except that it particularly focuses on graph data structure. Similar to data mining, graph mining is to analyze graph structures and extract meaningful patterns from those big graphs. Graph mining has attracted much attention of researchers because of the existence of massive graphs in real world (social networks, food web and protein-protein interaction networks etc.), as well as valuable applications in broad applications (e.g. fraud detection, recommendation systems and news propagation etc.). Anomaly detection in large graphs is a problem arising in this area: once normal patterns have been obtained from graphs, can we detect anomalies easily? The answer is yes!

Patterns & Fraud detection: Graphs are not randomly generated without any principals. Actually many interesting patterns have already been discovered by scientists, such as the famous power-low degree distribution, distribution of connected component size, triangle laws, k-core patterns and so on. Then how can we detect suspicious groups? Prof. Faloutsos gave an example of fraud detection in bipartite who-follows-whom graph. In social media (such as Twitter), users want to promote their own reputations, so that some fraudsters (dishonest followees) would buy followers (dishonest followers) to follow them. Besides, to camouflage such fraud behaviors, suspicious users would also deliberately create connections with a small fraction of honest users to make themselves look normal. For this case, they proposed a SVD-based approach [1]. By projecting who-follow-whom matrix into latent spectral subspace, they discovered different patterns in this space. (1) Randomly generated following behaviors is corresponding to the origin of subspace (normal situation), (2) suspicious dense "blocks" in adjacency matrix is transformed as "rays" in spectral subspaces, (3) "blocks" with camouflage is associated with tilting "rays" in spectral subspaces, and (4) overlapping "staircase" in matrix corresponds to "pearl" in latent subspace. Figure 1 illustrate the above mentioned patterns. By studying the projected patterns in latent subspace, the suspicious behaviors are easily detected.

Figure 1. Suspicious behavior in adjacency matrix and spectral latent subspace.
Several relevant works are also introduced [2-4], more details can be seen in References.

Time-evolving graphs & tensors: The first section of works only concern with static graphs. However, in reality, many graphs are evolving over time. Therefore, n-way tensor is introduced to describe temporal changes of graphs. Similar to SVD, tensors can also be decompose and projected into spectral latent subspaces, where normal patterns and strange patterns can be clearly distinguished. More relevant work can be seen in [5-6].


References:
[1] Jiang, M., Cui, P., Beutel, A., Faloutsos, C., & Yang, S. (2014, May). Inferring strange behavior from connectivity pattern in social networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 126-138). Springer International Publishing.
[2] Jiang, M., Beutel, A., Cui, P., Hooi, B., Yang, S., & Faloutsos, C. (2015, November). A general suspiciousness metric for dense blocks in multimodal data. In Data Mining (ICDM), 2015 IEEE International Conference on (pp. 781-786). IEEE.
[3] Koutra, D., Ke, T. Y., Kang, U., Chau, D. H. P., Pao, H. K. K., & Faloutsos, C. (2011, September). Unifying guilt-by-association approaches: Theorems and fast algorithms. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 245-260). Springer Berlin Heidelberg.
[4] Eswaran, D., Günnemann, S., Faloutsos, C., Makhija, D., & Kumar, M. (2017). ZooBP: Belief Propagation for Heterogeneous Networks. Proceedings of the VLDB Endowment, 10(5).
[5] Araujo, M., Papadimitriou, S., Günnemann, S., Faloutsos, C., Basu, P., Swami, A., ... & Koutra, D. (2014, May). Com2: fast automatic discovery of temporal (‘comet’) communities. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 271-283). Springer International Publishing.
[6] Ferraz Costa, A., Yamaguchi, Y., Juci Machado Traina, A., Traina Jr, C., & Faloutsos, C. (2015, August). Rsc: Mining and modeling temporal activity in social media. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 269-278). ACM.



More information:
Prof. Faloutsos website: http://www.cs.cmu.edu/~christos/
Talk slides: http://www.cs.cmu.edu/~christos/TALKS/17-02-UPitt/

Wednesday, February 22, 2017

Week 8: Collaborative knowledge base embedding for recommendation systems


Reading Summary

Motivation: In recommendation systems, conventional collaborative filtering approach usually has limited performance because of data sparsity of user-item matrix as well as cold-start problem. To address this issue, researchers are trying to methods that incorporate auxiliary heterogeneous datasets in order to boost recommendation performance. Those large-scale auxiliary datasets form a huge resource repository, which is referred as knowledge base. This paper leverages three types of knowledge – structural knowledge, textual knowledge and visual knowledge – to develop an integrated framework called “Collaborative Knowledge Base Embedding (CKE)” to achieve higher quality of recommendation.

Figure 1. Framework Overview.

Approaches:
The framework of CKE can be divided into three components – data preparation, knowledge base embedding and collaborative joint learning. Figure 1 offers a clear and general overview. (I) In first stage, they gather a dataset of user-item interactions. Here they focus on the implicit feedback scenario, where user-item interactions have ambiguous meanings. For example, no rating does not necessarily indicate no interests, it can also mean no awareness. Besides, they also collect a rich heterogeneous knowledge base containing three types of information – structural, textual and visual. Structural knowledge can be regarded as a heterogeneous network consisting of various entities (e.g. genre, actor, director) and various relationships (e.g. acting, directing). Textual data always contains topic relevant information, e.g. a storyline for a movie or judgements for a book. Visual knowledge can be referred to a book’s front page or a movie’s poster image. (2) In second stage, they use embedding approaches to learn item’s latent features from the above mentioned knowledge base. To be specific, for network structure, they adopt TransR [1], a heterogeneous network embedding method, to extract item’s structural representations. For textual information, they apply stacked denoising auto-encoders (SDAE) [2], a deep learning based embedding technique, to obtain item’s textual representations. For visual knowledge such as movie posters, they use stacked convolutional auto-encoders (SCAE) [3], another state-of-the-art deep learning based embedding approach, to learn visual representations for items. Note that deep learning based embedding techniques do not require any feature engineering, and the representations are directly extracted from raw data. (3) In final stage, to integrate collaborative filtering with item’s representations learned in second stage, they propose CKE approach. CKE involves the calculation of pairwise preference probability between any two items j and j' as p(j>j';i|theta) where theta represents model parameters. The final recommendation result is a ranked list of items.

Figure 2. Recall@K and MAP@K results for CKE and baselines.

Experiments:
They conduct experiments on two datasets – MovieLens-1M and IntentBooks. MovieLens-1M is a dataset consisting of 1M users’ ratings for movies, and IntentBooks is gathered by Microsoft’s Bing search engine and Microsoft’s Satori knowledge base. As shown in Figure 2, the complete comparisons with a series of baselines validate the effectiveness of the proposed CKE framework, and also sheds lights on future’s usage of heterogeneous data sources in recommendation systems.

References
[1] Lin,Y., Liu,Z., Sun,M., Liu,Y., and Zhu,X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of AAAI (2015).
[2] Vincent,P., Larochelle,H.,Lajoie,I.,Bengio,Y.,and Manzagol, P.-A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research 11 (2010), 3371–3408.
[3] Masci,J.,Meier,U.,Cires ̧an,D.,and Schmidhuber,J.Stacked convolutional auto-encoders for hierarchical feature extraction. In Artificial Neural Networks and Machine Learning–ICANN 2011. Springer, 2011, 52–59.