My Blog List

Saturday, November 19, 2016

Automatically Extracting Topical Components for a Response-to-Text Writing Assessment

Summary

In this talk, Rahimi introduces her recent work about automatically extracting topic components from source materials. Such source-based topic component extraction can be a replacement of manual efforts performed by experts and provides convenience for automatic assessment process.

Rahimi starts from the application end. To address the issue of automatic essay scoring, many prior approaches have been provided, such as bag of words, semantic similarity, content vector analysis and cosine similarity. However, many of them do not take source materials into consideration. Rahimi points out that, different from those prior work, their research highly relies on source materials, lying in the domain of response-to-text writing assessment. Given source materials, how to automatic evaluate students’ essays? Rahimi and her colleagues approach this problem by localizing pieces of evidence in students’ essays that match source materials. Instead of manually extract those evidence by experts, they aim to offer an automatic way to find such evidence.

To be specific, they use natural language processing techniques to automatically extract a comprehensive list of topics from source materials. The list of topics consists of topic words as well as specific expressions (N-grams) that students should include in their essays, also defined as “topic components”. Table 1 gives us a direct illustration about topic components.


Table 1. Automatically extracted topic words and N-gram expressions for each topic. They are extracted by the proposed data-driven LDA-enabled model.

To evaluate the performance of automatic extraction of topic components. Rahimi compare their method with manual results and other competing baselines. Results are shown in table 2. It shows that their proposed method is very promising and outperforms all other models. However, compared with manual upper bound, they still have much improvement space.
Table 2. Performance of models using automatically extracted topical components, baseline models, and manual upper-bound. 
About the talk:

Talk URL: http://halley.exp.sis.pitt.edu/comet/presentColloquium.do?col_id=10540
Speaker: Zahra Rahimi
Homepage: http://people.cs.pitt.edu/~zar10/
Date: Nov 18, 2016