Аннотация:Many chapters in this book illustrate that applying a statistical method
such as latent semantic analysis (LSA; Landauer & Dumais, 1997;
Landauer, Foltz, & Laham, 1998) to large databases can yield insight into
human cognition. The LSA approach makes three claims: that semantic information can be derived from a word-document co-occurrence matrix;
that dimensionality reduction is an essential part of this derivation; and
that words and documents can be represented as points in Euclidean space.
This chapter pursues an approach that is consistent with the first two of
these claims, but differs in the third, describing a class of statistical models
in which the semantic properties of words and documents are expressed in
terms of probabilistic topics.