Friday, December 07, 2012

NIPS 2012 Workshop on Computational Sustainability

The Neural Information Processing Science conference, NIPS 2012 takes place in Lake Tahoe. NIPS 2012 is a highly recognized conference in the area of machine learning. After the main conference, a number of workshops take place in which recent developments and future opportunities are discussed.

In the workshop "Human Computation for Science and Computational Sustainability", the combination of crowdsourcing and machine learning is considered from methodological point of view as well as the opportunities in using such an approach in understanding and promoting environmental condition. The workshop is organized by Theodoros Damoulas, Thomas Dietterich, Edith Law and Serge Belongie.

One of the supporting organizations is Institute for Computational Sustainability at Cornell University. One of the directors of the institute, Tom Dietterich gave a keynote talk in the main NIPS conference on the topic "Challenges for Machine Learning in Computational Sustainability". In his talk, Dietterich stated that managing earth's ecosystems in a sustainable way has failed. For instance, mammalian populations are dropping rapidly worldwide. he gave three reasons to this situation: (1) ecosystems has not been considered as a management or control problem, (2) existing knowledge of function and structure of the ecosystems is inadequate, and (3) optimal management requires spatial planning over horizons of more than one hundred years. Dietterich continued by providing a detailed account on how computer science in general and machine learning in particular can be used to help in this situation.

The first invited talk in the workshop was given by Eric Horvitz, Microsoft Research. Horvitz provided examples on harnessing human and machine intelligence in citizen science, opportunities for conducting science with crowd, and the potential to coordinate crowd on physical tasks.

Rómer Rosales with his co-authors Subramanian, Fung and Dy considered crowdsourcing when one wishes to use supervised or semisupervised learning where ground truth exists but is not available or is expensive. Specifically they considered how to evaluate annotators and proposed a multiple annotator model for classification. Numerical properties of the quantities of interest (predictability, AUC) were considered, with respect to (1) varying number of helpful/unhelpful participants, and (2) varying levels of collabotative/adversial nature of participants.

Haimonti Dutta and William Chan presented a talk on "Using community structure detection to rank annotators when ground truth is subjective". They considered a historic The Sun newspaper collection where OCR was used to build the corpus, causing a number of problems because of garbled text. Categorization the articles manually would be impossible. In a sample, there were more than 14,000 articles in November and December 1894. Preliminary labels were created using Community Structure Detection algorithm based on constructing a similarity graph from articles. In a similarity graph, an edge exists between two articles if the cosine similarity of the tf-idf measure exceeds a given value.

In her presentation "Crowdsourcing citizen science data quality with a human-computer learning network", Andrea Wiggins discussed in detail eBird, a web-based database of bird observations collected by a large community of observers. The system has revolutionized the way that the birding community reports and accesses information about birds. Wiggins' presentation showed clearly the potential of such a system in collecting information about the state of environment. An interesting topic was the role and nature of expertise of the observers. In general, increasing level of expertise helps in making good quality observations. On the other hand, the quest for novelty may lead even experts to some kind of crowd hallucinations when they eagerly wish to see something new. It was concluded that research on human learning is also important in relation to this area, and that the framework of such a database can approve to be an interesting data collection for studying human learning, too.

Edwin Simpson presented a method called Dynamic Independent Bayesian Classifier Combination to deal with crowdsourced classification tasks. The probabilistic model of changing worker behavior treats artificial and human agents as base classifiers. Bayesian approach is used to combining decisions. Fast inference is based on using Variational Bayes. The methods can deal with limited training data. Simpson presented experimental results based on the Zooniverse environment for citizen science projects.

Serge Belongie gave a talk on "Building the Visipedia Field Guide to North American Birds". Visipedia is a kind of visual counterpart to Wikipedia. Belongie discuss various aspects related to crowdsourcing, machine vision, and machine learning in building the system. One conclusion was that people are very keen on providing help when birds are in question. This kind of motivational aspect was discussed earlier when Andrea Wiggins stated that there is large variety of birds, their characteristics are reasonably easy recognized and there is always emerging novelty available.

In the discussions, some interesting papers were mentioned including "Multidimensional Wísdom of Crowds" by Welinder, Branson, Belongie and Perona, "Incentives for Truthful Reporting in Crowdsourcing" by Kamar and Horvitz, and "Eliciting informative feedback: The peer-prediction method" by Miller, Resnick and Zeckhauser. In addition, Detexify2, LaTeX symbol classifier was discussed.

Towards the end of the workshop, strategic and interdisciplinary questions were discussed, ranging from tools for enhancing collaboration and collaboration, communities of practice, zone of proximal development, challenges related to global reach to multiplicity of human expertise. Eric Horvitz, as a newly nominated program co-chair with Björn Hartmann (UC Berkeley), mentioned that AAAI is launching a new conference called HCOMP.

No comments: