BEDLAN conducts research in two main areas:
- development of dialects ("microevolution")
- development of languages ("macroevalution")
Kaj Syrjänen was unable to attend the meeting but detailed description of the research results was given by his collaborators Jyri Lehtonen and Terhi Honkonen. Lehtinen introduced a Uralic vocabulary data collection used in the project. The data includes 17 languages with information on connections between lexical items in these languages (e.g. Finnish, Sami, Estonian, Komi, Udmurt, Hungarian, Mordvin, Mansi, Khanty, Livonian, Tundra Nenets, Karelian and Veps). Etymological dictionaries were used to analyze the historical connection. The number of words was 226. Examples of words include "meet", "moon" and "mother". These are in Finnish "liha", "kuu" and "äiti", in Karelian "liha", "kuu" and "emä", and in Veps "liha", "ku" and "mam".
Terhi Honkonen gave a presentation on the computational analysis of the data. In the introduction of the methodology, she referred to McMahon and McMahon (2005): "Language Classification by Numbers" (Oxford), and Atkinson and Gray (2006): "Curious parallels and curious connections - Phylogenetic thinking in biology and historical linguistics" (Systematic Biology, 54:513-526). The method used was Bayesian phylogenetic analysis (using a program called MrBayes).
A strong merit of this kind of research is that conclusions on the relationships between languages and dialects are made based on vocabulary patterns rather than on individual word instances.
Jyri Lehtonen continued by presenting research on using network analysis methods. First he discussed the differences between tree-based models and network models (e.g. Heggarty et al. 2010) and showed results of network analysis on Uralic languages. The network analysis divided the languages into groups of Baltic-Finnic, Saami, Samoyedic, Ugric and Permic languages. Meadow Mari and Mordvin were not clearly connected with any of these groups. Lehtonen mentioned the classical lexico-statistical research by Swadesh in 1950s and continued by presenting research on the effect of using more or less central vocabulary. Usually central vocabulary is used where the lexical items are typically stable and morphologically simple. In the Loanword Typology Project, 1400 meanings are considered. This has further lead to Leipzig-Jakarta list which includes 100 "most central" meanings. Lehtonen argued that less central vocabulary may help in detecting language connections in a fine-grained manner.
Lehtonen's presentation inspired to think about the future of scientific representation and the role of animations in it. In this case, it seems that an animation of the development of the network structure could be useful.
In the end of the meeting, Terhi Honkonen presented research results on analyzing the timing of divergence of languages. The method used for the analysis is BEAST (Bayesian Evolutionary Analysis Sampling Trees), originally developed for Bayesian MCMC analysis of molecular sequences.
No comments:
Post a Comment