Advances in Probabilistic and Other Parsing Technologies Harry Bunt and Anton Nijholt (editors) (Tilburg University and University of Twente) Dordrecht: Kluwer Academic Publishers (Text, speech and language technology series, edited by Nancy Ide and Jean Véronis, volume 16), 2000, xv+267 pp; hardbound, ISBN 0-7923-6616-6, $112.00, £71.00, Dfl 230.00
Аннотация:the Fifth International Workshop on Parsing Technologies, held at MIT in September 1997.Several of the papers are already well-known and others should be.The book could easily be used as the basis for a graduate-level advanced course on parsing.The title is unwieldy, but appropriate: most but not all of the papers have a strong probabilistic flavor.My favorite papers are Erik Hektoen on "Probabilistic parse selection based on semantic co-occurrences," Jason Eisner on "Bilexical grammars and their cubic-time parsing algorithms," and Chris Manning and Bob Carpenter on "Probabilistic parsing using left corner language models."I like these papers because they step back from the details of parsing technology and consider its wider significance.Manning and Carpenter offer both detail and overview.They provide a series of probabilistic models that relax the context-freeness assumption of probabilistic contextfree grammars, measure performance in the usual way, draw appropriate conclusions, then provide the kicker in the form of a brief section explaining "Why parsing the Penn Treebank is easy."As Manning and Carpenter point out, in the particular case of the Penn Treebank, the currently accepted PARSEVAL metrics (Grishman, Macleod, and Sterling 1992) are actually quite easy to do well on, even if the system makes systematic errors on such things as prepositional-phrase attachment.If systems are to be deployed into situations where such deficiencies might matter, it might be necessary to find more appropriate evaluation methods.This issue has subsequently been addressed by others (Carroll, Briscoe, and Sanfilippo 1998;Carroll, Minnen, and Briscoe 1999), who argue for more obviously task-related evaluation schemes involving predicate argument structure and/or dependency information.Hektoen's contribution is in the same vein; it takes seriously the notion that parsing is often simply a device for getting at an underlying semantics.Under his scheme, parse selection relies on the ability to collect statistics over semantic forms.Following this path leads Hektoen into a careful exposition of a Bayesian-estimation approach to parse selection, which appears to be "a sufficient response to the high degree of sparseness in the lexical co-occurrence data without the blurring associated with smoothing and clustering" (p.162).Hektoen's approach appears to work well; of course, it does require a broad-coverage parser capable of generating semantic representations, which may be an obstacle for many.The exposition of the method is very clear and the comparison with previous approaches is enlightening.