Sunday, May 05, 2013

Per Linell: Interactivity and Intersubjectivity

Professor Per Linell is known for his influential work within linguistics including the books "Rethinking Language, Mind and World Dialogically: Interactional and contextual theories of human sense-making" and "The Written Language Bias in Linguistics: Its Nature, Origins and Transformations". On Friday, 3rd of May, Linell gave an invited talk entitled "Interactivity and intersubjectivity: Dialogical perspectives" in a seminar series organized by the Finnish Centre of Excellence in Research on Intersubjectivity in Interaction.

Linell started by citing and discussing William F. Hanks' book "Language form and communicative practices": "These [...] questions arise from a series of contradictions in language: It is both an abstract system system and an intimate part of our daily experience, and individual capacity and a social fact, a form and an activity." He pointed out that Noam Chomsky took a narrow view on linguistic theory formation assuming ideal speaker-listeners who know language perfectly and are unaffected by such irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors. Linell reminded that Chomsky's assumption moves aside most of language use, or languaging, and was formulated only to save the notion of one underlying abstract language system.

As an opposing or complementary view to Chomsky, Linell presented interactionism: The sense-making ability of humans is rooted in social interaction; the mind is interactive, dialogical, social, shared, extended, distributed, etc. He discussed in detail the relation between an individual and the social level of reality, and concluded that both points of view are necessary and can be brought together. The mind lives in/through the ecosocial world, referring to works on "interactive mind" (Schegloff, 1991; Trognon & Batt, 2010), "social mind" (Valsiner & van der Veer, 2000), "shared mind" (Zlatev et al. 2008), "extended mind" (Clark & Chalmers, 1998), "enactive mind" (Thompson, 2007), "distributed mind" (Cowley, 2011) and "dialogical mind (Linell, 2009, and others). In Finland, related work within education science has been conducted by professor Kai Hakkarainen with his colleagues who have published a book on "Communities of networked expertise: Professional and educational perspectives" (2004). Linell reminded that even though individuals have their own bodies and have personal biographies and conceptions of self, they are also partly constituted in/through self-other relations. People have dialogical emotions such as shame, guilt, compassion, empathy and conscience.

Intersubjectivity was a theme that Linell discussed in detail and only some aspects can be reported here. He considered intersubjectivity to be an alternative or intermediate position to subjectivity and objectivity. He comprised some of the main points of dialogical theories. First, participants in interactivities produce and understand real actions and utterances in the world. Second, one moves away from a single individual towards an individual in interaction with others. These interactions are situated and form situation-transcending sociocultural practices. Linell concluded by stating that individualism and collectivism are both insufficient for solving the conceptual problem in the theory of language. Rather than preserving the Cartesian dichotomy, one can start out from the dialogical foundation of both individuals and communities, i.e. the interactivity between self and others.

The talk was also very interesting from the point of view that in computational modelling of cognition and language related parallel developments have taken place. In 1980s it was commonplace to develop rule-based systems for language processing with an idea that they could capture the linguistic skills of a generalized language speaker. The knowledge acquisition bottleneck was recognized as well as the practical conclusion that "all grammars leak".

One serious line of research that has tried to alleviate these problems in computational modelling of language is based on (statistical) machine learning. The basic idea is to device systems that learn language based on large corpora rather than trying to formulate linguistic rules manually. A nowadays commonplace practice is to collect statistics of morphemes, words or expressions appearing in contexts and use some suitable method to model the relationship between these elements using the context data. In an early study, the self-organizing map algorithm was used to create a map of words in Grimm brothers' fairy tales. The result included emergent implicit categories of nouns, verbs and some subcategories within them including animate and inanimate nouns. As attempt to build a bridge between the individual and social dimensions, simulation models have been developed in which communities of artificial agents converge towards a shared symbol set in a number of interactions. The context can also be multimodal, for instance, in formulating a mapping between words and expressions that describe human movement and the corresponding complex visual movement patterns. Moreover, Grounded Intersubjective Concept Analysis (GICA) method has been developed as a attempt to quantify semantic variation. In essence, GICA aims to measure the degree of intersubjectivity.

Friday, April 05, 2013

ICANNGA'13 - Tom Heskes: Reading the Brain with Bayesian Machine Learning



The International Conference on Adaptive and Natural Computing algorithms, ICANNGA'13 opened with a keynote talk by prof. Tom Heskes from the Radboud University of Nijmegen titled Reading the Brain with Bayesian Machine Learning.

Technology Review had listed Bayesian Machine Learning (BML) as one of the emerging technologies that will change our world in 2004, and we were first treated with a short introduction to BML, and the current state of the research. There are many general purpose software tools such as Bugs, Jags and Infer.net available and quite a few cool models, but killer applications and better techniques for discovering causal relations are still needed.

Heskes then gave examples of applications of Bayesian ML in the neuroimaging domain, or more specifically those related to Brain-Computer Interfaces. He defined the goal as classifying the mental states of the brain. 'The holy grail' of the research would be to provide the means of communication for a person who has lost all motor control (for example due to ALS).

That goal has not been reached yet, but we were given some examples of the state of current research. First Heskes showed us how it is possible to classify imagined movement of fingers from EEG data. In another experiment the focus of covert attention (focus attention without moving your eyes) on different directions was used instead. The results are strong enough to predict the angle the person was attending.

Functional Magnetic Resonance Imaging (fMRI) provides a possibility to try to classify image data. First a goal was to classify handwritten 6's versus 9's based on the fMRI data. The next logical step is then to try to predict what image the subject was seeing, and further down the road predict what a person is imagining. Currently, it was possible to do reconstruction of the handwritten 6's and 9'swith Deep Bolzmann machines (with a background idea that the brain might be doing something similar).

If this isn't enough, the Bayesian framework has also been used for inferring brain networks yielding a clustered graph. All in all, a very enlightening talk and a great start for the conference.

(Picture courtesy of Tom Heskes)

Friday, March 22, 2013

Hyppänen: Decision makers’ use of intuition at the front end of innovation

Olli Hyppänen is defending his thesis "Decision Makers’ Use of Intuition at the Front End of Innovation" on Friday, 22nd of March at Aalto University School of Science. Juha Laurila from University of Turku is serving as the opponent and Karlos Artto as the custos.

Empirical research on human knowing and experience has clearly shown that expertise is based on skills and knowledge that are difficult to represent explicitly in linguistic form. Dijksterhuis et al. have recently shown that intuitive decision making gives systematically better results than reliance on explicit or rational thinking in solving complex problems (see "Modeling communities of experts" for more details).

In his thesis, Hyppänen states that the findings in managerial decision making research suggest that decision makers most often use intuition in uncertain situations. Moreover, innovation front end is a good example of an environment with high uncertainty. An essential motivation for Hyppänen's research is the finding that the use of intuition in decision making has not been extensively researched at the innovation front end context. According to the defendant, existing research in innovation front end decision making has concentrated on building traditional normative models to deal with uncertainty.

The first part of empirical data of the thesis research consists of 19 interviews in 4 ICT companies. The second part of data consists of the results from 86 questionnaires from innovation decision makers. The results of the initial phase resulted in list of categories and related properties relevant in the development of the grounded theory framework for innovation front end decision making. Decision making was the main focus of the data analysis, non-rational elements in decision making emerging as a core category. this category was named as intuition. In Hyppönen's thesis, the main research questions are the following.

  1. How does intuition reveal itself in innovation front end decision making?
  2. What approaches do decision makers have when using intuition at the innovation front end?
  3. How do experienced decision makers differ from inexperienced decision makers in their use of intuition?

The opponent explored, for instance, methodological challenged related to the thesis.

Pylkkönen: Towards Efficient and Robust Automatic Speech Recognition

Janne Pylkkönen is defending his PhD thesis "Towards Efficient and Robust Automatic Speech Recognition: Decoding Techniques and Discriminative Training" on Friday, 22nd of March, 2013 at Aalto University School of Science. The research has been conducted in a research group focusing on speech technology, lead by Prof. Mikko Kurimo. Dr. Erik McDermott, Google, is serving as the opponent and Prof. Erkki Oja as the custos. The research on speech recognition has long roots in Otaniemi as Academician Teuvo Kohonen conducted active research in this area already in early 1980s and developed the famous neural phonetic typewriter with his team.

Pylkkönen's thesis presents methods for decoding and modeling the acoustics for large vocabulary continuous speech recognition with two main contributions. First, he has developed a large vocabulary decoder suitable especially for morphologically rich languages. Second, he has improved discriminative training of acoustic models to increase their robustness. The thesis also includes a theoretical analysis of discriminative training where the extended Baum-Welch algorithm is formulated as a constrained optimization method. The methods have been tested using a speech recognition system that has been developed over the years with contributions from a number of researchers.

In the lectio precursoria, Pylkkönen described the background and motivation of his research related to speaker-independent large vocabulary continuous speech recognition. He introduced the main ideas related to acoustic modeling, language modeling and decoding that combines the information related to the speech signal and language statistics. Pylkkönen gave an example on the complexity of the task. When the system "knows" 20,000 morphs (word segments), 24 phonemes, 1500 phoneme models and 40,000 Gaussian components, there are 3,200,000 parameters to be estimated.

In his opening statement, Erik McDermott first told that he works in the speech division at Google that develops speech recognition capabilities of the Android phones. He mentioned that Android has speech recognition for 40 languages including Finnish. He reminded of the challenges related to speech recognition. There are still fundamental problems with the technology and therefore active research is still needed. McDermott recognized the important contributions by Teuvo Kohonen and Erkki Oja in providing what he called an organic view to pattern recognition systems. In essence, this refers to the contributions related to unsupervised machine learning where systems improve over time based on a data-driven approach.

In the discussion, several methodological themes considered in detail related to decoding techniques, potential use of finite-state transducers, pruning techniques, discriminative training, maximum likelihood modeling and Gaussian mixtures.

Friday, March 15, 2013

Johnson: How Social Media Changes User-Centred Design

Mikael Johnson is defending his thesis "How Social Media Changes User-Centred Design” at Aalto University School of Science. A case study on Sulake's Habbo Hotel questions basic assumptions of user-centred design concerning social media service design. In his lectio precursoria, Johnson discussed different aspects of the concept user and user-centred design. He further presented views on social media and presented some central questions behind his research. These included (1) (2) (3). The method applied in the thesis is based on explorative case studies. The specific context has been Habbo Hotel that is one the largest social media applications in the world for teenagers. The results have been presented in twelve publications, among which seven have been included in the thesis. In his work, Johnson developed two new concepts, i.e. developer–user social distance and content creation capacity. The motivation has been to help designers and researchers to consider and communicate previously neglected dimensions of user involvement.

The opponent, professor Jan Gulliksen from KTH Royal Institute of Technology, paid attention to the fact that some research questions and topics such as user-centered design deserve continuous attention. He mentioned that no one would ask cancer researchers are not asked why they still continue as the topic has been explored already for a long time. Gulliksen reminded that huge amounts of time are used yearly to deal with non-optimalities of computerized systems which motivates paying attention to information systems development and how they serve the users. He underlined the idea that designers should not be "psychopaths" so that they should be able to take users' point of view. Gulliksen said that Johnson had a magnificent opportunity to conduct a longitudinal study.

The first question professor Gulliksen posed was: "Is this a usable thesis?". Johnson referred to the concepts of satisfaction, effectiveness and efficiency, and their relation to different user groups such as the opponent, colleagues or one's mother. After some discussion on the cover of the thesis, various aspects were covered in detail.

The event was attended by a remarkably large multidisciplinary audience from universities and research institutions such as National Consumer Research Centre. The defence was nicely multilingual as the custos, Professor Marko Nieminen, opened the event in Finnish and English, lectio precursoria was given in Swedish, and the opponent started with some words of Swedish and continued by using English.

Thursday, March 14, 2013

Mammassis: Multi-Source Research on Pharmaceutical Industry

In the Strategy Research Colloquium series, Constantinos S. Mammassis gave a talk on "Conducting Multi-Source Scientific Research". Mammassis is from the Department of Business Administration, University of Patras, Greece.

In his presentation, Mammassis discussed the difficulties related to managing huge amount of data. He gave an outline on how to conduct innovation research in the context of pharmaceutical industry where such data is available. Problems in creating multisource databases include data losses due to data transfornmation and availability, long discovery periods that span through several years and computational issues.

Friday, March 01, 2013

Vector-based Models of Semantic Composition

In ACL-08, Jeff Mitchell and Mirella Lapata presented ín their paper "Vector-based Models of Semantic Composition" a careful comparison of different approaches for representing the meaning of phrases and sentences in vector space. Their work was motivated by the fact that most studies of vector-based representation of meaning had been concentrating on separate words only. A bag-of-words approach is useful in finding topics or meaning components using methods like LSA or WordICA.

The word-level approach does not take into account word order which naturally limits the applicability of these methods. In particular, the limitations has concerned propositional meaning. The logic-based approaches, on the other hand, have limitations on how graded phenomena and contextuality can be modeled. A classical example is Montague semantics that is formally attractive but quite far from being a realistic model of meaning due to its simplicity. Therefore, it is important to development models that take the vector-space representations to the level of sentences. The usefulness of these representations is explained carefully by Peter Gärdenfors in his book on conceptual spaces.

Mitchell and Lapata considered a wide range of composition models which they evaluated empirically on a sentence similarity task. Their main conclusion was that multiplicative models are better than additive alternatives when the computational models are compared with human judgments. Classical works in this area include Smolensky's article in 1990 in which he proposed the use of tensor products as a means of variable binding and representing symbolic structures in a vector-based framework. Since 2008, many researchers have continued work in this area including Erk and Padó (2008), Turney and Pantel (2010), Baroni and Zamparelli (2010), Grefenstette and Sadrzadeh (2011), and Clarke (2012).

Monday, February 25, 2013

Workshop on Modeling Conceptual Change: Computational Views

The fourth workshop on Modeling Conceptual Change took place in Hanasaari conference center, Espoo from 20th to 22nd of February, 2013. The purpose of the workshop series is to open up new empirical and methodological avenues for research in conceptual change. Thanks to the funding provided by Finnish Cultural Foundation, a number of high-level scholars working in the area of conceptual change and related areas were invited to give presentations and participate discussions. The workshop series is organized by University of Helsinki in collaboration with representatives from Aalto University and University of Turku. The chair of the series, Ismo Koponen welcomed the participants and the introduced the main objectives of the workshop.

Paul Thagard is the director Computational Epistemology Laboratory at University of Waterloo, Canada, and the author of a number of influential books including Computational Philosophy of Science, Conceptual Revolutions, Hot Thought: Mechanisms and Applications of Emotional Cognition, and The Cognitive Science of Science. Thagard has also lead the development of EMPATHICA, a software program designed to help people understand and resolve conflicts.

In his talk "Concepts, Conceptual Change, and Explanatory Identities", Thagard explained how to use Semantic Pointer Architecture in cognitive modelling. The architecture stems from the research lead by Chris Eliasmith. Semantic pointers are patterns of neural firing that (1) provide shallow semantics through symbol- like relations to the world and other representations, (2) expand to provide deeper semantics with relations to perceptual, motor, and emotional information, (3) support complex syntactic operations, and (4) help to control the flow of information through a cognitive system to accomplish its goals. Thagard stressed the idea that representations are processes, not things, and realized as patterns of firing of neural populations. Mathematically, semantic pointers are vectors that decompressed into other vectors.

Timo Honkela is the head of the computational cognitive systems research group at Aalto University School of Science, Espoo, Finland. In his talk "Lessons learned in computational modeling of language learning and conceptual change", Honkela took the model presented by Merenluoto and Lehtinen in their paper "Number concept and conceptual change" as a starting point. He found a list of key concepts including concept, perception, understanding, conflict, and change and continued by presenting different theoretical frameworks and computational means to analyze and model these concepts. Honkela considered the question what is the relation between concepts and language. According to Language of Thought Hypothesis, thought and thinking are done in a mental language, i.e., in a symbolic system physically realized in the brain. Honkela promoted a view that has been formulated by Heinz von Foester as "a formalism necessary and sufficient for a theory of communication must not contain primary symbols representing communicabilia (e.g. symbols, words, messages, etc." Honkela concluded his presentation by showing results related to modeling subjectivity of understanding, based, for example, on the use of Grounded Intersubjective Concept Analysis method.

Andrea A. diSessa is a professor in the Graduate School of Education at the University of California, Berkeley, and a member of the US National Academy of Education. He has authored the books Changing Minds: Computers, Learning and Literacy and Turtle Geometry: The Computer as a Medium for Exploring Mathematics with Harold Abelson. As one of the developers, the latter book describes the Logo programming language.

The starting point of Andrea diSessa's talk "Modeling Conceptual Change at the Knowledge Level" was consideration of the strengths and weaknesses of the traditional study of conceptual change. He pointed out that there are almost no examplar of real time analysis in traditional conceptual change research. To provide basis for a detailed analysis of conceptual change, diSessa explained the main elements of his knowledge ontology. These include P-prims, coordination classes, diverse types of mental models, narratives and nominal facts. P-prims are phenomenological primitives that are more or less evident in experience, activited as a unit and serve as a base level of explanation. Recent work in this area includes an article in Cognition and Instruction by Kapon and diSessa in which they explain the real-time processes and individual differences in overcoming misconceptions via instructional analogies.

David Danks is an Associate Professor of Philosophy and Psychology at Carnegie Mellon Department of Philosophy. In his talk "Changing Concepts for Causal Coherence", Danks considered carefully two different perspectives - predicting observational versus interventional data, and showed that these can easily lead to different conceptual structures. Expressed in a more formal way, a distinction in model prediction as probability estimation can made between the observational case in which P(X=x | C=c) is estimated and the interventional case in which P(X=x | do(C=c) ) is estimated. He considered the situation in which there is no special epistemological access and therefore concepts must be learned from data. This means that all interventional predictions must allow for uncertainty about the underlying causal structures. He considered this issue in the framework of probabilistic directed acyclic graphical (DAG) models. Danks' striking conclusion was that there is no single "correct" conceptual set. In other words, the "correct" conceptual set for observational prediction is different from the most functional conceptual set for interventional prediction. Ordering of the quality of conceptual schemes is sometimes underdetermined by data, i.e. given the same data set, one can get different orderings dependeing on one's goals.

Peter Gärdenfors is professor of Cognitive Science at University of Lund, Sweden. Gärdenfors' influential books include "Conceptual Spaces: The Geometry of Thought" and "The Dynamics of Thought". In his talk "Conceptual change as dimensional change: conceptual spaces applied to the dynamics of empirical theories", Gärdenfors showed how conceptual change can be understood in the framework of conceptual spaces. A conceptual space is a multi-dimensional feature space where points denote objects, and regions denote concepts. In conceptual spaces, quality dimensions denote basic features in which concepts and objects can be compared. Changes occur in terms of the structure of the dimensions. Gärdenfors identified five types of changes: (1) addition or deletion of special laws, (2) change in scale or metric, (3) change in the importance of dimensions, (4) change in the separability of dimensions, and (5) addition or deletion of dimensions. Using this approach, the conceptual development of empirical theories becomes gradual and rationalizable. As an example of addition and deletion of dimensions, Gärdenfors described the transformation from Newtonian machanics to special relativity.

Antonio Lieto is a post-doc research fellow at the Department of Computer Science, University of Turin. In his presentation "Concepts, (Formal) Ontologies and Conceptual Change", Lieto discussed parallel considerations within cognitive science and artificial intelligence related to knowledge representation. He pointed out that within AI, there is a contraposition between two conflicting requirements for the knowledge representation systems, compositionality and representing prototypical information. Lieto described the basics of formal ontology representation and reasoning, and discussed how ontologies could be used as a tool to analyze conceptual change. He also discussed in some detail what are open problems related to formal ontologies. Ontologies are expected to represent common sense concepts and, more in general, non-axiomatic knowledge. However, OWL (Ontology Web Language) does not allow to represent non-classical concepts. Furthermore, common sense reasoning is often non-monotonic. Lieto presented a solution outline based on a hybrid approach in which also protypical information can be dealt with.

In addition to the talks that lasted for half an hour each, a lot of time was reserved to discussions that also took place in group work format. The research group lead by Ismo Koponen at University of Helsinki had prepared a case study related to physics education. Theme session discussions were chaired by Terhi Mäntylä, Koponen and Honkela. In one theme session, the participants considered in groups how conceptual change can be modeled using Bayesian models (chaired by Danks), P-prims (diSessa), semantic pointers (Thagard) and self-organizing maps (Tiina Lindh-Knuutila).

Otto Lappi, Henri Kauhanen and Tommi Kokkonen presented results of analysing a corpus of students' texts. These texts are related to the area of electricity and include terms such as power, current, power, resistance, and voltage. Network analysis and text mining methods had been used to explore students' conceptions.
For the text mining a division into diagnostic and context terms had been made. The relationships between the diagnostic terms are reflected by their associations with the context terms.

The workshop took place in Hanasaari Conference Center, suitably situated between the city center of Helsinki and the high-tech area in city of Espoo that hosts, for instance, Aalto University and the headquarters of Nokia (to see further details of the satellite view, see Nokia Maps/Here).
The workshop was attended by a number of researchers in science education, cognitive science, cognitive modeling and philosophy. The participants were from USA, Italy and Nordic countries (Finland, Sweden and Denmark). The intensive three-day workshop provided also many opportunities for informal discussions on conceptual change and related topics. The practical arrangements gave an excellent basis for this for which the local organizers Ismo Koponen and Anna-Mari (Ansku) Rusanen are to be thanked for.

Friday, February 15, 2013

Directions for the future of education and knowledge management

Current educational systems are based on the idea of a high degree of harmonization of the contents that are being taught and the degrees that are awarded by schools, universities and other institutions. This harmonization or standardization can take place at a national level but there are also international efforts like the Bologna process that has aimed at ensuring comparability of the standards and quality of higher education. This may sound a feasible objective at first sight but there are substantial problems that can be seen analogical to the problems related to centrally planned economy. It may be fair to state that the planned economy proved to be a flawed idea due to the fact that it is at least practically impossible to predict the future needs and to plan the system that would serve these needs in a proper way. It is also impossible to take into account the effects of the continuous development in the technologies that are used in creating products and services. Therefore, market economy is an efficient way for the dynamical optimization of the match between needs and wants. Here it may be necessary to note that market economy and capitalism are two different things.

Having the analogy discussed above in mind, the present way of organizing the education system suffers from similar problems. Decision makers try to anticipate even decades beforehand what are the future conditions and what kind of knowledge and skills are needed to serve the needs of future societies. A rather small number of people make these decisions even though it may be obvious nowadays that the anticipatory power of a large crowd is much stronger than the small group of decision makers. Moreover, any fixed categorization system is a hindrance to innovation because creative problem solving and decision making is very often based on deconstruction and reconstruction of conceptual systems. In essence, the intellectual "markets" of human knowing could be liberated from such fixed categorization or at least considered at a much more refined level. For instance, at schools the skills and knowledge of pupils is assessed using something like 10 to 20-dimensional vectors whereas the true complexity of human knowing is much higher. We use the harmonization basically for two reasons: communication and setting objectives. When a harmonized system is in place, one can, in principle, communicate how much each person knows about a particular area. Due to the coarse and categorical nature of the system, the communication is, however, far from truthful or efficient. Also the idea of setting objectives through degrees that are awarded is far from an ideal situation. Namely, degree are a form of external motivation which are to be deemed secondary to intrinsic motivational factors. A pupil, student or any person should be motivated to learn and to know more by the contents, not by the fact that passing some tests leads to some degree.

The practical implementation of a new kind of educational system that is based on the markets of knowing and being-able-to is not straightforward because alternatives could be overly chaotic. New information processing systems can pave way to these kinds of developments. Two recent articles in Information Processing & Management indicate future directions related to education and societal knowledge management. In the future, the knowledge and skills of each individual can be managed in a personalized manner. If you think twice the implications of the technologies described in our article "Assessing user-specific difficulty of documents" by Paukkeri, Ollikainen and Honkela, you may see the emerging pattern of future possibilities. Another article in the same journal, Inferring user knowledge level from eye movement patterns, gives a good example techniques that can be used to measure the knowledge of an individual. These developments lead into a situation in which information systems can store the profiles of our knowledge and skills as vectors with millions of elements, thus acknowledging our abilities in a much more truthful way than the current systems. Moreover, in the future humans can be credited for all of their skills and knowledge regardless of path how they acquired their abilities. The current system can be stated to be highly unfair in many different ways. The task of teachers and educators will to be serve as wise coaches that help to build the big picture, to indicate good ways of learning and gaining experience, and to keep up high levels of energy and intrinsic motivation among the learners. Seen from this point of view, the future of education is bright but also very different from what is in place now.

Friday, January 18, 2013

Tralogy II Conference on Human and Machine Translation

The Tralogy II conference takes place at CNRS headquarters in Paris on January 17th to 18th, 2013. The event brings together specialists in human translation and researchers in the field of machine translation, or more broadly speaking experts in automatic language processing, IT tools and the language industry. The specific themes of the conference is "the quest for meaning: where are our weak points and what do we need?" This second Tralogy conference is organised jointly by the CNRS (IMMI and INIST), the SFT, the European Commission (DGT, EC Representation in France), Paris Diderot University (UFR EILA), and AFFUMT. The conference program (day 1 and day 2) includes a number of interesting presentations given both in French and English. Connections between human and machine translators were considered in many presentations. For instance, Jan Hajic gave a talk on the topic "Meaning in Translation: Translators Teaching Machines". David Farwell's talk "Pragmatics and High Quality MT" was insightful. With his colleague Stephen Helmreich, Farwell has conducted research already for some time that takes the pragmatic level of language seriously into account. An example of the results of their research is the paper on "Pragmatics-based MT and the Translation of Puns".

Hans Uszkoreit, Director of DFKI Language Technology Lab and the Coordinator of the META-NET Network of Excellence, gave a talk on "Translation Quality Metrics for Human and Automatic Translation". The objectives of the work that takes place in the QTLaunchPad project, includes assembling and providing data and tools translation corpora, test suites, and tools for quality assessment, as well as creating a shared quality metrics. The current results are based on collaboration between DFKI, DCU, University of Sheffield, and ILSP Athens, and the consortium is extending in the future. Uszkoreit reminded the audience that quality measured by widely used measures such BLEU, NIST and METEOR does not indicate the type of quality problems. The basic goal is to provide both simplicity and sophistication taking into account that there are different tasks that are associated with different needs. Uszkoreit discussed a number of quality criteria related to language (lexical choice and terminology, ortohography, grammar, meaning ad accuracy, style, and punctuation) and document (structure, layout, fonts and styles, objects, and marking). Uszkoreit pointed out that sometimes the quality of the translation, when coming from a skillful human translator, can be even better than the quality of the source text.

Thursday, January 10, 2013

Science Forum 2013 opened: Science of Crisis

The Science Forum 2013 was opened today and will be held until 13th of January. This year the theme of this large science event, intended also for the general public, is crisis. As the organizers state, the world is in a state of constant change and largely unpredictable. In the Science Forum, crises are described and explained by means of science.

In her opening speech, professor Pirjo Ståhle discussed the relationship between universities and other parts of society and analyzed some of the characteristics of the current innovation system in Finland. Ståhle showed that for researchers there are contradicting objectives. Research funding often emphasizes the need for practical innovations. On the other hand, evaluation in scientific community depends on high quality scientific publications. It is problematic to serve both purposes especially at an individual level. This concern is particularly relevant for researchers who do not have a permanent position.

Ståhle pointed out that the current situation is a direct consequence of the strategic decision made in building Finnish innovation system. One may characterize the situation so that the companies have a better access to results from Finnish academic research due to intensive collaboration efforts, but at the same time, Finnish universities do not produce high quality papers in top scientific journals at the same pace as some relevant countries in comparison. She also mentioned the considerable effect that the digitizing will have on the practices of research and education. This theme was recently considered in a report on Andrew Ng's talk on online education. Ståhle presented several conclusions:
  • The societal impact of universities should not be seen only through the collaboration between them and private enterprises.
  • Promotion of researchers should take into account and encourage interdisciplinary work as well as social and ecological innovations.
  • Finland should have a Minister of Science.

The Minister of Education, Jukka Gustafsson welcomed the audience to the Science Forum on behalf of Finnish government. Gustafsson mentioned that the use of scientific research results in societal decision making should be further strengthened. He stated that the Science Days support this objective but also new structures may be needed in order to achieve this. Gustafsson emphasized how the European Union has contributed in stabilizing our continent but that work on ensuring peace is still needed. Also the European year of citizens was discussed with the concern that unemployment is a serious problem. Minister Gustafsson concluded by emphasizing the importance of competence, culture and interaction in society alongside with mutual respect and open atmosphere.

Robin Goodwin, Professor of Social Psychology at Brunel University, London, gave a keynote talk entitled "Perception of terrorism and other widespread threats". He referred to the modern age of anxiety. As a piece of evidence for this, he mentioned a research result according to which there has been a significant increase in anxiety in the U.S. between 1952 and 1993 both among adults and children. Goodwin mentioned that the research on perceiving threat has been dispersed between disciplines including research on post traumatic stress or coping mechanisms at an individual level or community resources at the social level. However, most have not combined individual-level factors with broader societal factors. Few consider the various different trajectories that might emerge following a crises event.

Goodwin discussed in some detail the early research about the inevitable trauma dramatic events. A recognition based on this research is that not all suffer from an event in a similar. In some cases, there are even positive consequences such as sense of group cohesion, growth in personal mastery, and development of new strengths and skills. This variety of responses to trauma suggests complex processes of appraisal/perception. Goodwin's intermediate conclusions was that in the cases of crises situations, real world resources are important to consider.

At a personal level, one may ask whether the crisis has a goal relevance. Also past experiences of an event matter as they may serve as a reference point. A previously safe environment may make the shock of even greater but, on the other hand, they may also give one also psychological resources to cope over time. Goodwin mentioned the phenomenon of emotional contagion, i.e. catching emotions from others. He mentioned that studies have shown that general mass violence has a greater psychological impact than a technological disaster, and a technological disaster greater impact than a natural disaster.

Traumatic events may challenge beliefs about the world leading to changes in specific representations as well as generalized axioms, including, e.g., cynicism. The events may also lead to associated behavior, e.g., fatalistic beliefs. Goodwin discussed details of a number of cases including H1N1 and the Great East Japan Earthquake after which he conducted research on people's values, level of anxiety, impact of their location, perceived control over risk, and trust on government.

Finally, Goodwin discussed a phenomenon of relationship amplification that typically follows crises situations. On the other hand, outsiders may be rejected, even more than before.

Thursday, December 13, 2012

Virpioja: Learning constructions of natural language

Sami Virpioja defended his comprehensive dissertation "Learning Constructions of Natural Language: Statistical Models and Evaluations" for the Aalto University Department of Information and Computer Science. In his thesis Virpioja studies the problem of  lexical unit selection for the automatic processing of text. He proposes the use of unsupervised and semi-supervised statistical methods instead of simple heuristic or grammatical rule-based methods.
The work is based on the previously developed unsupervised Morfessor method which learns to segment words into surface morphemes (morphs) based solely on the statistical regularities found in a text corpus. In Virpioja's thesis, the discovered morphs are shown to improve different applications, such as automatic speech recognition and statistical machine translation.
 
Virpioja has extended the Morfessor method to handle allomorphic variations which can model the morphological relations between morphs. He has also developed a minimally semi-supervised variant of the original method that takes a very small number of manually segmented words as additional input and can find morphs which better match with a known linguistic segmentation. Virpioja has also developed methods which can evaluate the match between the discovered morphs and linguistic morphemes. The same problem of finding relationship between features in a multidimensional data was solved with CCA to leverage an  existing bi-lingual corpus for the evaluation of learned sematinc vector spaces for documents.

Prof. Brian Roark (Oregon Health & Science University) and Doc. Krister Lindén (University of Helsinki) served as the opponents and provided expertise for both the computational and linguistic sides of the dissertation. The questions ranged from possible extensions and applications of the work to philosophical ruminations of linguistic theory. In their final statement, they thanked the candidate for his excellent work which covered experiments both in vivo and in vitro.

Wednesday, December 12, 2012

WSOM 2012 opened in Santiago, Chile

WSOM 2012, Workshop on Self-Organizing Maps takes place in the summerly city of Santiago, Chile from 12th to 14th of December, 2012. The conference is organized by the University of Chile, Faculty of Physical and Mathematical Science. The conference site is the School of Engineering. Pablo Estévez serves as the general chair, José Príncipe as the co-chair and Pablo Zegers as the program chair.

The honorary chair, Teuvo Kohonen, the inventor of the original self-organizing map algorithm, presented his greetings to the conference audience through a video presentation. Among other things, he reminded of the fact that the SOM-based scientific literature consists currently of over 12,000 scientific publications. Academician Kohonen mentioned that his article "Essentials of the self-organizing map" will be published in the Neural Networks journal and is already available online.

During the three days, 33 papers are presented in a single track by participants and authors from Argentina, Brazil, Canada, Chile, Finland, France, Germany, Italy, Japan, Mexico, Netherlands, Romania and South Africa. The two plenary talks are by Barbara Hammer on "How to visualize large data sets" and by José Príncipe on "Self-organization using information theoretic learning". Pavlos Protopapas gives an invited talk on "Data mining for astronomy". The conference proceedings are published by Springer.

Monday, December 10, 2012

NIPS 2012 Workshop on Personalizing education with machine learning

In the NIPS 2012 conference, a workshop on "Personalizing education with machine learning" was organized by Michael C. Mozer, Javier R. Movellan, Robert V. Lindsey and Jacob Whitehill. The workshop consisted of twenty short oral presentations and was very well attended. The themes covered by the talks included, for instance, applying reinforcement learning models, stochastic optimal control theory, and Bayesian inference models on the diagnosis and decision making in educational contexts. In general, data and text mining techniques were used to model learning processes and to guide pedagogical processes. In the following, a small subset of the talks is described in some detail. Andrew Ng's talk "The Online Revolution: Education for Everyone" is discussed in another Cognitive Systems blog post.

Vivienne Ming and Norma Ming gave a talk on "Inferring Conceptual Knowledge From Unstructured Student Writing". Their approach was based on the idea that continuous, passive assessment can be used to elucidate conceptual knowledge. In other words, text mining was applied on students' writings to analyze their progress and to see if the text mining results can be used to predict course outcomes. This approach can be built on teachers' existing instruction and the wealth of unstructured data in an unintrusive manner. Ming and Ming showed that topic models of unstructured student writing can predict course outcomes. The work was motivated by the fact that in many cases the conceptual structure of the domain is not known well enough to facilitate detailed conceptual modeling. The study was based on texts written in online discussion forums during courses biology and economics lasting for five or six weeks. There were two or more mandatory discussion questions per week. It was found out that extra weeks of data improve the predictions. Ming and Ming also found out that using hierarchical topic modeling (based on hLDA) improves the results over traditional topic modeling (based on pLSA). In the methodological remarks, also cognitive components based on hidden-state conditional random fields were mentioned. Potential other text sources that can be used include online tutoring, informal learning environments, annotations on e-texts, and Wiki contributions.

Min Chi gave a talk on her work with Kurt VanLehn, Diane Litman and Pamela Jordan entitled "Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical tactics". The talk covered mathematics and science in which solving domain problems often consists of multiple domain principles and applying them in an appropriate way step-by-step. An instructor may choose to explain a step to the students ("tell") or ask the students to formulate the step themselves ("elicit"). A potential positive result of eliciting is the generation effect but also frustration may be caused if the task approves to be too difficult. Telling is accurate from the point of view of presentation but from the students' point of view it may lead into lack of attention and shallow processing. Chi presented a detailed account on how pedagogical tactics can be induced in the framework of Markov Decision Processes. The basic idea is to use reinforcement learning to determine what is the best action for the tutor to take in any learning context in order to maximize student learning. The actions in the model are elicit and tell, and the states are student performance and concept difficulty. The reward for the reinforcement learning consists of the student learning gains. In her approach, the transition probabilities are estimated from the training dataset. Min Chi listed also challenges like state representation not being clear and the state transitions being unknown. The experimental results were based on teaching students concepts in physics related to work and energy.

Vikram Ramanarayanan presented results on analyzing videos of educational situations. In his talk "A framework for unusual event detection in videos of informal class settings", he studied how engagement and disengagement could be recognized from videos in which 5 to 12 year old children are learning science. The results were promising even though formally the recognition measures were quite low. This was, however, to a large degree because the human annotators of the videos had often not labeled obvious cases of engagement or disengagement. Methodologically, linear dynamical systems modeling was used including a jump-Markov time-series model. The basic motivation of the work is to develop methods for real-time analysis of learning situations.

Saturday, December 08, 2012

Andrew Ng: The Online Revolution: Education for Everyone

One of the most awaited talks in the NIPS 2012 workshop on "Personalizing education with machine learning" was Andrew Ng's "The Online Revolution: Education For Everyone". Ng is a the director of the Stanford Artificial Intelligence Lab at Stanford University and a co-founder of Coursera, a company that offers for free online courses by top universities online for anyone to take. His research interests include machine learning and neuroscience-informed artificial intelligence.

Coursera is an impressive development in providing high-quality education to large numbers of students. Ng used as an example his own course on machine learning that is attended by about 100,000 students instead of 400 in the times of class room education. The courses include lectures divided into about ten minute long videos that include foreign language captioning. Students have a possibility of instant replay which helps considerably following the lectures because different students needs different time. In other words, students can navigate through the lecture materials at one's own order and pace. There are various tools to increase interactivity leading into a surprising conclusion that a web site can be more interactive than a normal class. Namely, if there is one student that answers professor's question well, there may be 39 or 399 others who have already lost track.

In areas like computer science, mathematics and natural science, autograded homeworks and exercises are used. The Coursera environment also provides a chance to have continuous practice with the material. Ng provided an interesting example related to the autograding of computer programs. In one occasion, about 2,000 students (out of 100,000) submitted the same kind of wrong answer. They were able to provide personalized advice to these 2,000 students in the form of a targeted customized message. There are, however, many areas of study in which autograding is not viable. Ng gave as an example teaching a poetry class. In such areas, peer grading and even self grading can be useful. Coursera has 50,000 student classes with crowd sourced peer grading. Not anyone is given the same rights for grading but there is process through which students gain proficiency in grading.

Coursera helps in building a global community around each course. Ng used as an example a course on sociology. The median response time on students' questions was found to be 22 minutes. This much better than in a normal class if a student sends by e-mail a question to the lecturer, course assistants, or other participants. Moreover, these kind of courses provide an exciting opportunity to conduct educational research. Rather than having two groups of students, 20 persons each, one may have 20,000 plus 20,0000 students. Ng also mentioned that one can identify the forum discussion threads which are most likely to lead to positive learning outcomes. He noticed once that in his own lecture something had not been explained in sufficient detail. He could see that one of the students had provided an additional explanation that could then be referred to when some other students encountered problems of that particular kind.

Andrew Ng referred to Benjamin Bloom's 2 Sigma Problem. In mid 1980s, Bloom had found that an average student tutored one-to-one using mastery learning techniques performed two standard deviations better when compared with conventional instructional methods. In this sense, Ng acknowledged the strength of mastery learning and even more individual tutoring. As a society, we were not able to give everyone an individual tutor, and therefore he emphasized the need to provide tools that have similar qualities to the extent possible.

Regarding the role of face-to-face situation in a class room, Ng cited Plutarch's statement that "the mind is not a vessel that needs filling, but wood that needs igniting." Therefore, providing teaching materials for free in the net does not downplay the value of being a student in a high-quality university with the stimulating environment of professors and peers. One-to-one access to the professors and peers is thus an important motivation to go to universities in the future, too.

Ng described his own way of lecturing that has considerably evolved thanks to the tools available. The students watch the lectures beforehand at home. The class room working can then focus on problem solving in small groups. He coined this mode of working as just-in-time teaching. In other words, the teacher can use the time to interact with the students in the classroom rather than lecturing and students remaining passive for the most of the time. Ng considered that he is serving my students best this way and remains well motivated himself even if giving the course for the tenth time. The materials in the web can also be used in any university in the world where a local instructor can concentrate on one-to-one mentoring, without the need to invest a lot in providing the basic educational materials including lectures and slides.

Andrew Ng emphasized the importance of universal access to education and stated that high quality education is a fundamental human right. I had a chance to discuss with Ng shortly after the presentation, and brought up, among other things, the fact that this very same attitude has lead Finland to have one of the best performing educational systems in the world. In his presentation, Ng also emphasized that it is important that the courses are available for free. In order to have good coverage, one cannot assume that all the students would have chance to pay even a modest fee, and they may not possess a credit card. A statement with high moral value was also that education is about helping students to succeed. Therefore, each student should have given the chance to learn - not to categorize them to an A, B or C student. The time needed to learn can be variable but when good web-based materials and tools are available, this variation is not a problem anymore.

In general, the future of education may be seen to be brighter than ever. There is less and less need to organize education in a rigid, top-down manner. In some sense, much of education has been traditionally organized in an analogical way as planned or command economy. It has also suffered from similar problems of inefficient resource distribution and suppression of democracy and self-management. At worst cases, teachers and professors have used their work in order to boost their egos rather than serving the pupils and students in their need and right to learn. It may be viable to say, though, that children need to guided in what they learn. At university level, at latest, the situation should be driven by the motivation and spirit of the people. In long term, I foresee that there is no need to organize university-based education so that it provides a small number of degrees. From the point of view of modern knowledge management, human knowledge and skills can be described and appreciated in much more fine-grained, or, in machine learning terms, high-dimensional manner.

Friday, December 07, 2012

NIPS 2012 Workshop on Computational Sustainability

The Neural Information Processing Science conference, NIPS 2012 takes place in Lake Tahoe. NIPS 2012 is a highly recognized conference in the area of machine learning. After the main conference, a number of workshops take place in which recent developments and future opportunities are discussed.

In the workshop "Human Computation for Science and Computational Sustainability", the combination of crowdsourcing and machine learning is considered from methodological point of view as well as the opportunities in using such an approach in understanding and promoting environmental condition. The workshop is organized by Theodoros Damoulas, Thomas Dietterich, Edith Law and Serge Belongie.

One of the supporting organizations is Institute for Computational Sustainability at Cornell University. One of the directors of the institute, Tom Dietterich gave a keynote talk in the main NIPS conference on the topic "Challenges for Machine Learning in Computational Sustainability". In his talk, Dietterich stated that managing earth's ecosystems in a sustainable way has failed. For instance, mammalian populations are dropping rapidly worldwide. he gave three reasons to this situation: (1) ecosystems has not been considered as a management or control problem, (2) existing knowledge of function and structure of the ecosystems is inadequate, and (3) optimal management requires spatial planning over horizons of more than one hundred years. Dietterich continued by providing a detailed account on how computer science in general and machine learning in particular can be used to help in this situation.

The first invited talk in the workshop was given by Eric Horvitz, Microsoft Research. Horvitz provided examples on harnessing human and machine intelligence in citizen science, opportunities for conducting science with crowd, and the potential to coordinate crowd on physical tasks.

Rómer Rosales with his co-authors Subramanian, Fung and Dy considered crowdsourcing when one wishes to use supervised or semisupervised learning where ground truth exists but is not available or is expensive. Specifically they considered how to evaluate annotators and proposed a multiple annotator model for classification. Numerical properties of the quantities of interest (predictability, AUC) were considered, with respect to (1) varying number of helpful/unhelpful participants, and (2) varying levels of collabotative/adversial nature of participants.

Haimonti Dutta and William Chan presented a talk on "Using community structure detection to rank annotators when ground truth is subjective". They considered a historic The Sun newspaper collection where OCR was used to build the corpus, causing a number of problems because of garbled text. Categorization the articles manually would be impossible. In a sample, there were more than 14,000 articles in November and December 1894. Preliminary labels were created using Community Structure Detection algorithm based on constructing a similarity graph from articles. In a similarity graph, an edge exists between two articles if the cosine similarity of the tf-idf measure exceeds a given value.

In her presentation "Crowdsourcing citizen science data quality with a human-computer learning network", Andrea Wiggins discussed in detail eBird, a web-based database of bird observations collected by a large community of observers. The system has revolutionized the way that the birding community reports and accesses information about birds. Wiggins' presentation showed clearly the potential of such a system in collecting information about the state of environment. An interesting topic was the role and nature of expertise of the observers. In general, increasing level of expertise helps in making good quality observations. On the other hand, the quest for novelty may lead even experts to some kind of crowd hallucinations when they eagerly wish to see something new. It was concluded that research on human learning is also important in relation to this area, and that the framework of such a database can approve to be an interesting data collection for studying human learning, too.

Edwin Simpson presented a method called Dynamic Independent Bayesian Classifier Combination to deal with crowdsourced classification tasks. The probabilistic model of changing worker behavior treats artificial and human agents as base classifiers. Bayesian approach is used to combining decisions. Fast inference is based on using Variational Bayes. The methods can deal with limited training data. Simpson presented experimental results based on the Zooniverse environment for citizen science projects.

Serge Belongie gave a talk on "Building the Visipedia Field Guide to North American Birds". Visipedia is a kind of visual counterpart to Wikipedia. Belongie discuss various aspects related to crowdsourcing, machine vision, and machine learning in building the system. One conclusion was that people are very keen on providing help when birds are in question. This kind of motivational aspect was discussed earlier when Andrea Wiggins stated that there is large variety of birds, their characteristics are reasonably easy recognized and there is always emerging novelty available.

In the discussions, some interesting papers were mentioned including "Multidimensional Wísdom of Crowds" by Welinder, Branson, Belongie and Perona, "Incentives for Truthful Reporting in Crowdsourcing" by Kamar and Horvitz, and "Eliciting informative feedback: The peer-prediction method" by Miller, Resnick and Zeckhauser. In addition, Detexify2, LaTeX symbol classifier was discussed.

Towards the end of the workshop, strategic and interdisciplinary questions were discussed, ranging from tools for enhancing collaboration and collaboration, communities of practice, zone of proximal development, challenges related to global reach to multiplicity of human expertise. Eric Horvitz, as a newly nominated program co-chair with Björn Hartmann (UC Berkeley), mentioned that AAAI is launching a new conference called HCOMP.

Friday, November 23, 2012

Tuomas Sandholm visiting Aalto University

Tuomas Sandholm is Professor in the Computer Science Department at Carnegie Mellon University as well as a serial entrepreneur. Tuomas is visiting Aalto University where he is giving a talk on "Modern Dynamic Kidney Exchanges". In kidney exchanges, patients with kidney disease can obtain compatible donors by swapping their own willing but incompatible donors. Sandholm has developed with his colleagues the first algorithm capable of clearing these exchanges optimally on a US-nationwide scale.

Sandholm is one of the most highly cited Finnish Computer Scientists. In 2003, Sandholm received the prestigeous Computers and Thought Award. The award is presented at IJCAI conferences to outstanding young scientists in the area of artificial intelligence. He was recognized for his contributions to computational economics and the theory and practice of negotiation and coalition formation among computationally bounded agents. He is also a recipient of the NSF Career Award, the inaugural ACM Autonomous Agents Research Award, the Alfred P. Sloan Foundation Fellowship, and the Carnegie Science Center Award for Excellence.

Sandholm was also visiting the Cognitive Systems research group. Tuomas Sandholm and Timo Honkela collaborated in early 1990s when they both were at VTT in Finland and before Sandholm moved to USA. They coauthored an article on Machine Learning for the Finnish AI Encyclopedia in 1993.

Friday, November 16, 2012

Motivations for Computational Modeling

The book "Catalyzing Inquiry at the Interface of Computing and Biology" was collected in 2005 by the National Research Council Committee on Frontiers at the Interface of Computing and Biology as J.C. Wooley JC and H.S. Lin as editors. Even though the book is already several years old, the motivations listed for computational modeling are still very relevant. In the chapter Computational Modeling and Simulation as Enablers for Biological Discovery, the following motivations are listed:
  • models provide a coherent framework for interpreting data,
  • models highlight basic concepts of wide applicability,
  • models uncover new phenomena or concepts to explore,
  • models identify key factors or components of a system,
  • models can link levels of detail,
  • models enable the formalization of intuitive understandings,
  • models can be used as a tool for helping to screen unpromising hypotheses,
  • models inform experimental design,
  • models can predict variables inaccessible to measurement,
  • models can link what is known to what is yet unknown,
  • models can be used to generate accurate quantitative predictions, and
  • models expand the range of questions that can meaningfully be asked
Even though the article focuses on biological systems, the list is general and applicable also to cognitive and social systems.

Friday, November 09, 2012

Paukkeri: Language- and domain-independent text mining

Mari-Sanna Paukkeri defends her dissertation "Language- and domain-independent text mining". In her dissertation for the Aalto University Department of Information and Computer Science, Paukkeri has studied how textual data can be processed and analysed automatically with machine learning methods. She has developed computational methods for text processing independent of language or domain.

Paukkeri considers fully automatic methods, language independence and subjectivity in several natural language processing tasks. A fully automatic and language-independent approach for keyphrase extraction called Likey is presented and its performance is shown for 11 European languages, including English and Finnish.

In the thesis, an approach for learning taxonomies from encyclopedia documents is proposed. The work is an early step to automate the construction of ontologies and get ontologies more applicable to multilingual settings.

In the work related to lexical choice, machine learning methods are applied to a collection of as many linguistic features as possible to study how the linguistic features help in the machine learning task.

In Paukkeri's thesis, the feature extraction step in text mining is studied by analyzing the effect of different dimensionality reduction, normalization and distance measures in the task of document clustering and proposing an evaluation method for feature extraction (or document representation). To further show the level of language independence of these methods, the experiments are run with several languages from different language families.

The third main theme of the thesis, subjectivity of language use, is specifically considered in a task of assessing the difficulty of a text. A novel approach is proposed, in which the difficulty assessment is done separately for each user. In contrast to the traditional readability measures for difficulty assessment, the proposed method is intended for assessing suitable documents for adults that have knowledge of varying expertise areas. The article on this topic, "Assessing user-specific difficulty of documents", has been published in the prestigious Information Processing & Management journal.

Jussi Karlgren (Gavagai AB and KTH, Stockholm) served as an opponent. Karlgren's questions ranged from fundamental methodological issues to views on future research possibilities, based on his long experience in this field. In the final statement, he thanked the defender for her solid work in an important area of research.