Monday, March 05, 2012

Turning continuous variables into dichotomies?

In 2002, David L. Steiner has written an excellent article in which he argues that turning continuous variables into dichotomies is almost always a problematic thing to do. In the article, published in Canadian Journal of Psychiatry, he shows that the rationales given for the transformation are weak and that the categorization leads into loss of information, reduced power of statistical tests, and increased probability of a Type II error.

A related classic from 1999 in another discipline is Geoffrey C. Bowker and Susan Leigh Star's Sorting Things Out: Classification and Its Consequences. They explore the role of categories and standards and how the are shaping our modern world.

This chain of associations could lead to Zen buddhist philosophies or even to Aristotle, but in this context let's consider it sufficient just to mention Helge Ritter and Teuvo Kohonen's classical article "Self-Organizing Semantic Maps" from 1989. People who are interested in word spaces (e.g. LSA) and/or the relationship between epistemology, language, cognition and learning systems should definitely read this article. Further reading could be Von Foerster meets Kohonen: Approaches to artificial intelligence, cognitive science and information systems development" that links self-organizing maps with the work of Von Foerster (and Maturana and Varela as more recent scholars).

Ritter and Kohonen article is also an early account on random projection even though the underlying thinking was already formulated by W. Johnson and J. Lindenstrauss in their 1984 article "Extensions of Lipschitz mapping into a Hilbert space".

