In this talk, Hashemi introduces their newly published work entitled “An Evaluation of Parser Robustness for Ungrammatical Sentences”. In the realistic setting, natural language sentences not always correct and well-edited. In addition to those heavily edited texts such as news and formal reports, there are massive noisier texts including microblogs, tweets, consumer reviews, English-as-a-Second language writings (ESL) and Machine translation outputs (MT). Hashemi points out that, as the first step of natural languages processing, parsing influences the entire downstream applications. For the same parser, the parsing results for ungrammatical sentences are extremely different from that of grammatical sentences. As shown in Figure 1, one extra word “about” will heavily change the parsing results. Therefore, it is necessary to test the robustness of the state-of-the-art parsers on ungrammatical sentences.
They compare the robustness of those existing parsers by applying them into ungrammatical sentences. If a parser can overlook problems such as grammar mistakes and produce a parse tree that closely resembles the correct analysis for the intended sentence, then they say this parser is robust to ungrammatical sentences. Since manually annotated gold standards trees for ungrammatical sentences are time-consuming and expensive, they propose gold standard free approach. In specific, they take parse trees of well-formed sentences as gold standard. In this case the traditional evaluation metrics cannot be employed as words of ungrammatical sentences and its grammatical counterpart do not necessarily match. So, they present two definitions: error-related dependency and shared dependency. Error-related dependency corresponds to dependency connected to an extra word while shared dependency as mutual dependency between two trees. And Hashemi presents their measurements as:
· Precision is (# of shared dependencies) / (# of # of error-related dependencies of the ungram- matical sentence);
· Recall is (# of shared dependencies) / (# of de- pendencies of the grammatical sentence - # of error-related dependencies of the grammatical sentence); and
In their experiments, they test eight leading dependency parsers: Malt Parser, Mate Parser, MST Parser, Stanford Neural Network Parser, SyntaxNet, Turbo Parser, Tweebo Parser and Yara Parser. Their training data consists of Penn Treebank (50000 sentences) and Tweebank (1000 sentences). Their test data contains ESL and MT sentences. They find that different parsers exhibit various robustness performance on different datasets. If a data is more similar to tweets, Malt or Turbo may be good. If it is more like MT, SyntaxNet, Malt and Turbo are good choices.
About the talk:
Talk URL: http://www.isp.pitt.edu/node/1808
Speaker: Homa Hashemi
Date: Nov 18, 2016