Svetlana Vetchinnikova is defending her doctoral dissertation entitled "Second language lexis and the idiom principle" at the University of Helsinki. Professor Susan Hunston (University of Birmingham) serves as the opponent, and Professor Anna Mauranen as the custos.
In the thesis, Vetchinnnikova examines how second language users of English acquire, use and process lexical items. Three types of data were collected from five non-native students: (1) drafts of Master’s thesis chapters ("output"), (2) academic publications a student referred to ("input"), and (3) several hundreds of words the students used in their thesis were presented to them as stimuli in word association tasks. Lexical usage patterns ("output") were compared to the language exposure ("input") and to the word association responses.
As a study to lexical meaning and how meanings are learned, Vetchinnikova refers to a shift of focus in research from explicit to implicit lexical knowledge, considering multi-word units rather than single words and usage-based acquisition rather than explicit instruction. A similar shift has been taking place also in the computational modeling of language learning.
In the thesis, Vetchinnikova mentions that "Corpus Linguistics has made possible to observe language in a way that makes visible the patterns which are otherwise not discernible for human analytic abilities". Furthermore, she refers to Michael Stubbs who has stated in his ICAME 32 plenary talk that Corpus Linguistics enables similar kinds of analytical processes that led Darwin to his theory of species.
A central question in the thesis is related to how a string of words starts to mean something different from what the sum of the individual words comprising it would normally mean. Delexicalisation and the idiom principle are central notions here. The idiom principle, formulated by Sinclair, refers to the idea that a language speaker has available a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments.
One of the conclusions of the work is that the idiom principle is available to second language learners to a much larger extent than is usually claimed. It would be interesting to study how these results relate to the attempts to build machine learning systems that learn or detect multi-word expressions, used for keyphrase extraction.