Friday, December 19, 2008

Dissertation in corpus-based lexicography

MSc Antti Arppe successfully defended his dissertation Univariate, bivariate, and multivariate methods in corpus-based lexicography - a study of synonymy today at the University of Helsinki. The thesis is a methodological study in modeling lexeme selection in context when considering linguistic variation such as synonymy. The work applies polytomous logistic regression to produce odds for lexeme selection using a number of predictors. The odds for each individual predictor allows interpretation as to in which kind of contexts each lexeme is typically present or absent.

The opponent, Prof. R. Harald Baayen from the University of Alberta (Canada), provided an interesting discussion. It started by the notion of movement away Chomskyan intuition-based language analysis towards corpus analysis that studies how language works with statistical methods. It was followed by complex questions: Are grammars probabilistic? If so, how does the brain handle probabilities? Why does language allow synonymy? How to handle inter-dependencies between predictors? Language changes, how about the models?

Prof. Baayen mentioned the recently started Journal of Empirical Linguistics, which accepts only replicatable submissions that have to include all scripts that can be used to re-produce the results reported in the paper.

No comments: